Stephen Wolfram Writings

When Exactly Will the Eclipse Happen? A Multimillennium Tale of Computation

Stephen Wolfram — Fri, 29 Mar 2024 18:32:14 +0000

Updated and expanded from a post for the eclipse of August 21, 2017.

Preparing for April 8, 2024

On April 8, 2024, there’s going to be a total eclipse of the Sun visible on a line across the US. But when exactly will the eclipse occur at a given location? Being able to predict astronomical events has historically been one of the great triumphs of exact science. But how well can it actually be done now?

The answer is well enough that even though the edge of totality moves at just over 1000 miles per hour, it’s possible to predict when it will arrive at a given location to within perhaps a second. And as a demonstration of this, for the total eclipse back in 2017 we created a website to let anyone enter their geo location (or address) and then immediately compute when the eclipse would reach them—as well as generate many pages of other information.

It’s an Old Business

These days it’s easy to find out when the next solar eclipse will be; indeed built right into the Wolfram Language there’s just a function that tells you (in this form the output is the “time of greatest eclipse”):

It’s also easy to find out, and plot, where the region of totality will be:

Or to determine that the whole area of totality (including lots of ocean and some of Canada) will be about a third of the area of the US:

But computing eclipses is not exactly a new business. In fact, the Antikythera device from 2000 years ago even tried to do it—using 37 metal gears to approximate the motion of the Sun and Moon (yes, with the Earth at the center). To me there’s something unsettling—and cautionary—about the fact that the Antikythera device stands as such a solitary piece of technology, forgotten but not surpassed for more than 1600 years.

But right there on the bottom of the device there’s an arm that moves around, and when it points to an Η or Σ marking, it indicates a possible Sun or Moon eclipse. The way of setting dates on the device is a bit funky (after all, the modern calendar wouldn’t be invented for another 1500 years), but if one takes the simulation on the Wolfram Demonstrations Project (which was calibrated back in 2012 when the Demonstration was created), and turns the crank to set the device for April 8, 2024, here’s what one gets:

And, yes, all those gears move so as to line the Moon indicator up with the Sun—and to make the arm on the bottom point right at an H—just as it should for a solar eclipse. It’s amazing to see this computation successfully happen on a device designed 2000 years ago.

Of course the results are a lot more accurate today. Though, strangely, despite all the theoretical science that’s been done, the way we actually compute the position of the Sun and Moon is conceptually very much like the gears—and effectively epicycles—of the Antikythera device. It’s just that now we have the digital equivalent of hundreds of thousands of gears.

Why Do Eclipses Happen?

A total solar eclipse occurs when the Moon gets in front of the Sun from the point of view of a particular location on the Earth. And it so happens that at this point in the Earth’s history the Moon can just block the Sun because it has almost exactly the same angular diameter in the sky as the Sun (about 0.5° or 30 arc-minutes).

So when does the Moon get between the Sun and the Earth? Well, basically every time there’s a new moon (i.e. once every lunar month). But we know there isn’t an eclipse every month. So how come?

Well, actually, in the analogous situation of Ganymede and Jupiter, there is an eclipse every time Ganymede goes around Jupiter (which happens to be about once per week). Like the Earth, Jupiter’s orbit around the Sun lies in a particular plane (the “plane of the ecliptic”). And it turns out that Ganymede’s orbit around Jupiter also lies in essentially the same plane. So every time Ganymede reaches the “new moon” position (or, in official astronomy parlance, when it’s aligned “in syzygy”—pronounced sizz-ee-gee), it’s in the right place to cast its shadow onto Jupiter, and to eclipse the Sun wherever that shadow lands. (From Jupiter, Ganymede appears about 3 times the size of the Sun.)

But our Moon is different. Its orbit doesn’t lie in the plane of the ecliptic. Instead, it’s inclined at about 5°. (How it got that way is unknown, but it’s presumably related to how the Moon was formed.) But that 5° is what makes eclipses so comparatively rare: they can only happen when there’s a “new moon configuration” (syzygy) right at a time when the Moon’s orbit passes through the plane of the ecliptic.

To show what’s going on, let’s draw an exaggerated version of everything. Here’s the Moon going around the Earth, colored red whenever it’s close to the plane of the ecliptic:

Now let’s look at what happens over the course of about a year. We’re showing a dot for where the Moon is each day. And the dot is redder if the Moon is closer to the plane of the ecliptic that day. (Note that if this was drawn to scale, you’d barely be able to see the Moon’s orbit, and it wouldn’t ever seem to go backwards like it does here.)

Now we can start to see how eclipses work. The basic point is that there’s a solar eclipse whenever the Moon is both positioned between the Earth and the Sun, and it’s in the plane of the ecliptic. In the picture, those two conditions correspond to the Moon being as far as possible towards the center, and as red as possible. So far we’re only showing the position of the (exaggerated) Moon once per day. But to make things clearer, let’s show it four times a day—and now prune out cases where the Moon isn’t at least roughly lined up with the Sun:

And now we can see that at least in this particular case, there are two points (indicated by arrows) where the Moon is lined up and in the plane of the ecliptic (so shown in red)—and these points will then correspond to solar eclipses.

In different years, the picture will look slightly different, essentially because the Moon is starting at a different place in its orbit at the beginning of the year. Here are schematic pictures for a few successive years:

It’s not so easy to see exactly when eclipses occur here—and it’s also not possible to tell which are total eclipses where the Moon is exactly lined up, and which are only partial eclipses. But there’s at least an indication, for example, that there are “eclipse seasons” in different parts of the year where eclipses happen.

OK, so what does the real data look like? Here’s a plot for 20 years in the past and 20 years in the future, showing the actual days in each year when total and partial solar eclipses occur (the small dots everywhere indicate new moons):

The reason for the “drift” between successive years is just that the lunar month (29.53 days) doesn’t line up with the year, so the Moon doesn’t go through a whole number of orbits in the course of a year, with the result that at the beginning of a new year, the Moon is in a different phase. But as the picture makes clear, there’s quite a lot of regularity in the general times at which eclipses occur—and for example there are usually 2 eclipses in a given year—though there can be more (and in 0.2% of years there can be as many as 5, as there last were in 1935).

To see more detail about eclipses, let’s plot the time differences (in fractional years) between all successive solar eclipses for 100 years in the past and 100 years in the future:

And now let’s plot the same time differences, but just for total solar eclipses:

There’s obviously a fair amount of overall regularity here, but there are also lots of little fine structure and irregularities. And being able to correctly predict all these details has basically taken science the better part of a few thousand years.

Ancient History

It’s hard not to notice an eclipse, and presumably even from the earliest times people did. But were eclipses just reflections—or omens—associated with random goings-on in the heavens, perhaps in some kind of soap opera among the gods? Or were they things that could somehow be predicted?

A few thousand years ago, it wouldn’t have been clear what people like astrologers could conceivably predict. When will the Moon be at a certain place in the sky? Will it rain tomorrow? What will the price of barley be? Who will win a battle? Even now, we’re not sure how predictable all of these are. But the one clear case where prediction and exact science have triumphed is astronomy.

At least as far as the Western tradition is concerned, it all seems to have started in ancient Babylon—where for many hundreds of years, careful observations were made, and, in keeping with the ways of that civilization, detailed records were kept. And even today we still have thousands of daily official diary entries written in what look like tiny chicken scratches preserved on little clay tablets. “Night of the 14th: Cold north wind. Moon was in front of α Leonis. From 15th to 20th river rose 1/2 cubit. Barley was 1 kur 5 siit. 25th, last part of night, moon was 1 cubit 8 fingers behind ε Leonis. 28th, 74° after sunrise, solar eclipse…”

If one looks at what happens on a particular day, one probably can’t tell much. But by putting observations together over years or even hundreds of years, it’s possible to see all sorts of repetitions and regularities. And back in Babylonian times the idea arose of using these to construct an ephemeris—a systematic table that said where a particular heavenly body such as the Moon was expected to be at any particular time.

(Needless to say, reconstructing Babylonian astronomy is a complicated exercise in decoding what’s by now basically an alien culture. A key figure in this effort was a certain Otto Neugebauer, who happened to work down the hall from me at the Institute for Advanced Study in Princeton in the early 1980s. I would see him almost every day—a quiet white-haired chap, with a twinkle in his eye—and just sometimes I’d glimpse his huge filing system of index cards which I now realize was at the center of understanding Babylonian astronomy.)

One thing the Babylonians did was to measure surprisingly accurately the repetition period for the phases of the Moon—the so-called synodic month (or “lunation period”) of about 29.53 days. And they noticed that 235 synodic months was very close to 19 years—so that about every 19 years, dates and phases of the Moon repeat their alignment, forming a so-called Metonic cycle (named after Meton of Athens, who described it in 432 BC).

It probably helps that the random constellations in the sky form a good pattern against which to measure the precise position of the Moon (it reminds me of the modern fashion of wearing fractals to make motion capture for movies easier). But the Babylonians noticed all sorts of details of the motion of the Moon. They knew about its “anomaly”: its periodic speeding up and slowing down in the sky (now known to be a consequence of its slightly elliptical orbit). And they measured the average period of this—the so-called anomalistic month—to be about 27.55 days. They also noticed that the Moon went above and below the plane of the ecliptic (now known to be because of the inclination of its orbit)—with an average period (the so-called draconic month) that they measured as about 27.21 days.

And by 400 BC they’d noticed that every so-called saros of about 18 years 11 days all these different periods essentially line up (223 synodic months, 239 anomalistic months and 242 draconic months)—with the result that the Moon ends up at about the same position relative to the Sun. And this means that if there was an eclipse at one saros, then one can make the prediction that there’s going to be an eclipse at the next saros too.

When one’s absolutely precise about it, there are all sorts of effects that prevent precise repetition at each saros. But over timescales of more than 1300 years, there are in fact still strings of eclipses separated from each other by one saros. (Over the course of such a saros series, the locations of the eclipses effectively scan across the Earth; the upcoming eclipse is number 30 in a series of 71 that began in 1501 AD with an eclipse near the North Pole and will end in 2763 AD with an eclipse near the South Pole.)

Any given moment in time will be in the middle of quite a few saros series (right now it’s 40)—and successive eclipses will always come from different series. But knowing about the saros cycle is a great first step in predicting eclipses—and it’s for example what the Antikythera device uses. In a sense, it’s a quintessential piece of science: take many observations, then synthesize a theory from them, or at least a scheme for computation.

It’s not clear what the Babylonians thought about abstract, formal systems. But the Greeks were definitely into them. And by 300 BC Euclid had defined his abstract system for geometry. So when someone like Ptolemy did astronomy, they did it a bit like Euclid—effectively taking things like the saros cycle as axioms, and then proving from them often surprisingly elaborate geometrical theorems, such as that there must be at least two solar eclipses in a given year.

Ptolemy’s Almagest from around 150 AD is an impressive piece of work, containing among many other things some quite elaborate procedures—and explicit tables—for predicting eclipses. (Yes, even in the later printed version, numbers are still represented confusingly by letters, as they always were in ancient Greek.)

In Ptolemy’s astronomy, Earth was assumed to be at the center of everything. But in modern terms that just meant he was choosing to use a different coordinate system—which didn’t affect most of the things he wanted to do, like working out the geometry of eclipses. And unlike the mainline Greek philosophers he wasn’t trying to make a fundamental theory of the world; he just wanted whatever epicycles and so on he needed to fit what he observed.

The Dawn of Modern Science

For more than a thousand years Ptolemy’s theory of the Moon defined the state of the art. In the 1300s Ibn al-Shatir revised Ptolemy’s models, achieving somewhat better accuracy. In 1472 Regiomontanus (Johannes Müller), systematizer of trigonometry, published more complete tables as part of his launch of what was essentially the first-ever scientific publishing company. But even in 1543 when Nicolaus Copernicus introduced his Sun-centered model of the solar system, the results he got were basically the same as Ptolemy’s, even though his underlying description of what was going on was quite different.

It’s said that Tycho Brahe got interested in astronomy in 1560 at age 13 when he saw a solar eclipse that had been predicted—and over the next several decades his careful observations uncovered several effects in the motion of the Moon (such as speeding up just before a full moon)—that eventually resulted in perhaps a factor 5 improvement in the prediction of its position. To Tycho eclipses were key tests, and he measured them carefully, and worked hard to be able to predict their timing more accurately than to within a few hours. (He himself never saw a total solar eclipse, only partial ones.)

Armed with Tycho’s observations, Johannes Kepler developed his description of orbits as ellipses—introducing concepts like inclination and eccentric anomaly—and in 1627 finally produced his Rudolphine Tables, which got right a lot of things that had been got wrong before, and included all sorts of detailed tables of lunar positions, as well as vastly better predictions for eclipses.

Using Kepler’s Rudolphine Tables (and a couple of pages of calculations) the first known actual map of a solar eclipse was published in 1654. And while there are some charming inaccuracies in overall geography, the geometry of the eclipse isn’t too bad.

Whether it was Ptolemy’s epicycles or Kepler’s ellipses, there were plenty of calculations to do in determining the motions of heavenly bodies (and indeed the first known mechanical calculator—excepting the Antikythera device—was developed by a friend of Kepler’s, presumably for the purpose). But there wasn’t really a coherent underlying theory; it was more a matter of describing effects in ways that could be used to make predictions.

So it was a big step forward in 1687 when Isaac Newton published his Principia, and claimed that with his laws for motion and gravity it should be possible—essentially from first principles—to calculate everything about the motion of the Moon. (Charmingly, in his “Theory of the World” section he simply asserts as his Proposition XXII “That all the motions of the Moon… follow from the principles which we have laid down.”)

Newton was proud of the fact that he could explain all sorts of known effects on the basis of his new theory. But when it came to actually calculating the detailed motion of the Moon he had a frustrating time. And even after several years he still couldn’t get the right answer—in later editions of the Principia adding the admission that actually “The apse of the Moon is about twice as swift” (i.e. his answer was wrong by a factor of 2).

Still, in 1702 Newton was happy enough with his results that he allowed them to be published, in the form of a 20-page booklet on the “Theory of the Moon”, which proclaimed that “By this Theory, what by all Astronomers was thought most difficult and almost impossible to be done, the Excellent Mr. Newton hath now effected, viz. to determine the Moon’s Place even in her Quadratures, and all other Parts of her Orbit, besides the Syzygys, so accurately by Calculation, that the Difference between that and her true Place in the Heavens shall scarce be two Minutes…”

Newton didn’t explain his methods (and actually it’s still not clear exactly what he did, or how mathematically rigorous it was or wasn’t). But his booklet effectively gave a step-by-step algorithm to compute the position of the Moon. He didn’t claim it worked “at the syzygys” (i.e. when the Sun, Moon and Earth are lined up for an eclipse)—though his advertised error of two arc-minutes was still much smaller than the angular size of the Moon in the sky.

But it wasn’t eclipses that were the focus then; it was a very practical problem of his day: knowing the location of a ship out in the open ocean. It’s possible to determine what latitude you’re at just by measuring how high the Sun gets in the sky. But to determine longitude you have to correct for the rotation of the Earth—and to do that you have to accurately keep track of time. But back in Newton’s day, the clocks that existed simply weren’t accurate enough, especially when they were being tossed around on a ship.

But particularly after various naval accidents, the problem of longitude was deemed important enough that the British government in 1714 established a “Board of Longitude” to offer prizes to help get it solved. One early suggestion was to use the regularity of the moons of Jupiter discovered by Galileo as a way to tell time. But it seemed that a simpler solution (not requiring a powerful telescope) might just be to measure the position of our Moon, say relative to certain fixed stars—and then to back-compute the time from this.

But to do this one had to have an accurate way to predict the motion of the Moon—which is what Newton was trying to provide. In reality, though, it took until the 1760s before tables were produced that were accurate enough to be able to determine time to within a minute (and thus distance to within 15 miles or so). And it so happens that right around the same time a marine chronometer was invented that was directly able to keep good time.

The Three-Body Problem

One of Newton’s great achievements in the Principia was to solve the so-called two-body problem, and to show that with an inverse square law of gravity the orbit of one body around another must always be what Kepler had said: an ellipse.

In a first approximation, one can think of the Moon as just orbiting the Earth in a simple elliptical orbit. But what makes everything difficult is that that’s just an approximation, because in reality there’s also a gravitational pull on the Moon from the Sun. And because of this, the Moon’s orbit is no longer a simple fixed ellipse—and in fact it ends up being much more complicated. There are a few definite effects one can describe and reason about. The ellipse gets stretched when the Earth is closer to the Sun in its own orbit. The orientation of the ellipse precesses like a top as a result of the influence of the Sun. But there’s no way in the end to work out the orbit by pure reasoning—so there’s no choice but to go into the mathematics and start solving the equations of the three-body problem.

In many ways this represented a new situation for science. In the past, one hadn’t ever been able to go far without having to figure out new laws of nature. But here the underlying laws were supposedly known, courtesy of Newton. Yet even given these laws, there was difficult mathematics involved in working out the behavior they implied.

Over the course of the 1700s and 1800s the effort to try to solve the three-body problem and determine the orbit of the Moon was at the center of mathematical physics—and attracted a veritable who’s who of mathematicians and physicists.

An early entrant was Leonhard Euler, who developed methods based on trigonometric series (including much of our current notation for such things), and whose works contain many immediately recognizable formulas:

In the mid-1740s there was a brief flap—also involving Euler’s “competitors” Clairaut and d’Alembert—about the possibility that the inverse-square law for gravity might be wrong. But the problem turned out to be with the calculations, and by 1748 Euler was using sums of about 20 trigonometric terms and proudly proclaiming that the tables he’d produced for the three-body problem had predicted the time of a total solar eclipse to within minutes. (Actually, he had said there’d be 5 minutes of totality, whereas in reality there was only 1—but he blamed this error on incorrect coordinates he’d been given for Berlin.)

Mathematical physics moved rapidly over the next few decades, with all sorts of now-famous methods being developed, notably by people like Lagrange. And by the 1770s, for example, Lagrange’s work was looking just like it could have come from a modern calculus book (or from a Wolfram|Alpha step-by-step solution):

Particularly in the hands of Laplace there was increasingly obvious success in deriving the observed phenomena of what he called “celestial mechanics” from mathematics—and in establishing the idea that mathematics alone could indeed generate new results in science.

At a practical level, measurements of things like the position of the Moon had always been much more accurate than calculations. But now they were becoming more comparable—driving advances in both. Meanwhile, there was increasing systematization in the production of ephemeris tables. And in 1767 the annual publication began of what was for many years the standard: the British Nautical Almanac.

The almanac quoted the position of the Moon to the arc-second, and systematically achieved at least arc-minute accuracy. The primary use of the almanac was for navigation (and it was what started the convention of using Greenwich as the “prime meridian” for measuring time). But right at the front of each year’s edition were the predicted times of the eclipses for that year—in 1767 just two solar eclipses:

The Math Gets More Serious

At a mathematical level, the three-body problem is about solving a system of three ordinary differential equations that give the positions of the three bodies as a function of time. If the positions are represented in standard 3D Cartesian coordinates , the equations can be stated in the form:

The {x, y, z} coordinates here aren’t, however, what traditionally show up in astronomy. For example, in describing the position of the Moon one might use longitude and latitude on a sphere around the Earth. Or, given that one knows the Moon has a roughly elliptical orbit, one might instead choose to describe its motions by variables that are based on deviations from such an orbit. In principle it’s just a matter of algebraic manipulation to restate the equations with any given choice of variables. But in practice what comes out is often long and complex—and can lead to formulas that fill many pages.

But, OK, so what are the best kinds of variables to use for the three-body problem? Maybe they should involve relative positions of pairs of bodies. Or relative angles. Or maybe positions in various kinds of rotating coordinate systems. Or maybe quantities that would be constant in a pure two-body problem. Over the course of the 1700s and 1800s many treatises were written exploring different possibilities.

But in essentially all cases the ultimate approach to the three-body problem was the same. Set up the problem with the chosen variables. Identify parameters that, if set to zero, would make the problem collapse to some easy-to-solve form. Then do a series expansion in powers of these parameters, keeping just some number of terms.

The calculations were difficult, and people’s results often didn’t agree. And for example in 1843 Ada Lovelace noted that “In the solution of the famous problem of the Three Bodies, there are, out of about 295 coefficients of lunar perturbations [that had recently been computed]… only 101… agree precisely both in signs and in amount [with previous works]” (going on to say, rather farsightedly, that this was something the Analytical Engine would be able to solve).

By the 1860s Charles Delaunay had, however, spent 20 years developing the most extensive theory of the Moon using series expansions. He’d identified five parameters with respect to which to do his expansions (eccentricities, inclinations, and ratios of orbit sizes)—and in the end he generated about 1800 pages like this (yes, he really needed Mathematica!):

But the sad fact was that despite all this effort, he didn’t get terribly good answers. And eventually it became clear why. The basic problem was that Delaunay wanted to represent his results in terms of functions like sin and cos. But in his computations, he often wanted to do series expansions with respect to the frequencies of those functions. Here’s a minimal case:

And here’s the problem. Take a look even at the second term. Yes, the δ parameter may be small. But how about the parameter, standing for time? If you don’t want to make predictions very far out, that’ll stay small. But what if you want to figure out what will happen further in the future?

Well, eventually that term will get big. And higher-order terms will get even bigger. But unless the Moon is going to escape its orbit or something, the final mathematical expressions that represent its position can’t have values that are too big. So in these expressions the so-called secular terms that increase with must somehow cancel out.

But the problem is that at any given order in the series expansion, there’s no guarantee that will happen in a numerically useful way. And in Delaunay’s case—even though with immense effort he often went to 7th order or beyond—it didn’t.

One nice feature of Delaunay’s computation was that it was in a sense entirely algebraic: everything was done symbolically, and only at the very end were actual numerical values of parameters substituted in.

But even before Delaunay, Peter Hansen had taken a different approach—substituting numbers as soon as he could, and dropping terms based on their numerical size rather than their symbolic form. His presentations look less pure (notice things like all those , where is the time in years), and it’s more difficult to tell what’s going on. But as a practical matter, his results were much better, and in fact were used for many national almanacs from about 1862 to 1922, achieving errors as small as 1 or 2 arc-seconds at least over periods of a decade or so. (Over longer periods, the errors could rapidly increase because of the lack of terms that had been dropped as a result of what amounted to numerical accidents.)

Both Delaunay and Hansen tried to represent orbits as series of powers and trigonometric functions (so-called Poisson series). But in the 1870s, George Hill in the US Nautical Almanac Office proposed instead using as a basis numerically computed functions that came from solving an equation for two-body motion with a periodic driving force of roughly the kind the Sun exerts on the Moon’s orbit. A large-scale effort was mounted, and starting in 1892 Ernest W. Brown (who had moved to the US, but had been a student of George Darwin, Charles Darwin’s physicist son) took charge of the project and in 1918 produced what would stand for many years as the definitive “Tables of the Motion of the Moon”.

Brown’s tables consist of hundreds of pages like this—ultimately representing the position of the Moon as a combination of about 1400 terms with very precise coefficients:

He says right at the beginning that the tables aren’t particularly intended for unique events like eclipses, but then goes ahead and does a “worked example” of computing an eclipse from 381 BC, reported by Ptolemy:

It was an impressive indication of how far things had come. But ironically enough the final presentation of Brown’s tables had the same sum-of-trigonometric-functions form that one would get from having lots of epicycles. At some level it’s not surprising, because any function can ultimately be represented by epicycles, just as it can be represented by a Fourier or other series. But it’s a strange quirk of history that such similar forms were used.

Can the Three-Body Problem Be Solved?

It’s all well and good that one can find approximations to the three-body problem, but what about just finding an outright solution—like as a mathematical formula? Even in the 1700s, there’d been some specific solutions found—like Euler’s collinear configuration, and Lagrange’s equilateral triangle. But a century later, no further solutions had been found—and finding a complete solution to the three-body problem was beginning to seem as hopeless as trisecting an angle, solving the quintic, or making a perpetual motion machine. (That sentiment was reflected for example in a letter Charles Babbage wrote Ada Lovelace in 1843 mentioning the “horrible problem [of] the three bodies”—even though this letter was later misinterpreted by Ada’s biographers to be about a romantic triangle, not the three-body problem of celestial mechanics.)

In contrast to the three-body problem, what seemed to make the two-body problem tractable was that its solutions could be completely characterized by “constants of the motion”—quantities that stay constant with time (in this case notably the direction of the axis of the ellipse). So for many years one of the big goals with the three-body problem was to find constants of the motion.

In 1887, though, Heinrich Bruns showed that there couldn’t be any such constants of the motion, at least expressible as algebraic functions of the standard {x, y, z} position and velocity coordinates of the three bodies. Then in the mid-1890s Henri Poincaré showed that actually there couldn’t be any constants of the motion that were expressible as any analytic functions of the positions, velocities and mass ratios.

One reason that was particularly disappointing at the time was that it had been hoped that somehow constants of the motion would be found in n-body problems that would lead to a mathematical proof of the long-term stability of the solar system. And as part of his work, Poincaré also saw something else: that at least in particular cases of the three-body problem, there was arbitrarily sensitive dependence on initial conditions—implying that even tiny errors in measurement could be amplified to arbitrarily large changes in predicted behavior (the classic “chaos theory” phenomenon).

But having discovered that particular solutions to the three-body problem could have this kind of instability, Poincaré took a different approach that would actually be characteristic of much of pure mathematics going forward: he decided to look not at individual solutions, but at the space of all possible solutions. And needless to say, he found that for the three-body problem, this was very complicated—though in his efforts to analyze it he invented the field of topology.

Poincaré’s work all but ended efforts to find complete solutions to the three-body problem. It also seemed to some to explain why the series expansions of Delaunay and others hadn’t worked out—though in 1912 Karl Sundman did show that at least in principle the three-body problem could be solved in terms of an infinite series, albeit one that converges outrageously slowly.

But what does it mean to say that there can’t be a solution to the three-body problem? Galois had shown that there couldn’t be a solution to the generic quintic equation, at least in terms of radicals. But actually it’s still perfectly possible to express the solution in terms of elliptic or hypergeometric functions. So why can’t there be some more sophisticated class of functions that can be used to just “solve the three-body problem”?

Here are some pictures of what can actually happen in the three-body problem, with various initial conditions:

And looking at these immediately gives some indication of why it’s not easy to just “solve the three-body problem”. Yes, there are cases where what happens is fairly simple. But there are also cases where it’s not, and where the trajectories of the three bodies continue to be complicated and tangled for a long time.

So what’s fundamentally going on here? I don’t think traditional mathematics is the place to look. But I think what we’re seeing is actually an example of a general phenomenon I call computational irreducibility that I discovered in the 1980s in studying the computational universe of possible programs.

Many programs, like many instances of the three-body problem, behave in quite simple ways. But if you just start looking at all possible simple programs, it doesn’t take long before you start seeing behavior like this:

How can one tell what’s going to happen? Well, one can just keep explicitly running each program and seeing what it does. But the question is: is there some systematic way to jump ahead, and to predict what will happen without tracing through all the steps?

The answer is that in general there isn’t. And what I call the Principle of Computational Equivalence suggests that pretty much whenever one sees complex behavior, there won’t be.

Here’s the way to think about this. The system one’s studying is effectively doing a computation to work out what its behavior will be. So to jump ahead we’d in a sense have to do a more sophisticated computation. But what the Principle of Computational Equivalence says is that actually we can’t—and that whether we’re using our brains or our mathematics or a Turing machine or anything else, we’re always stuck with computations of the same sophistication.

So what about the three-body problem? Well, I strongly suspect that it’s an example of computational irreducibility: that in effect the computations it’s doing are as sophisticated as any computations that we can do, so there’s no way we can ever expect to systematically jump ahead and solve the problem. (We also can’t expect to just define some new finite class of functions that can just be evaluated to give the solution.)

I’m hoping that one day someone will rigorously prove this. There’s some technical difficulty, because the three-body problem is usually formulated in terms of real numbers that immediately have an infinite number of digits—but to compare with ordinary computation one has to require finite processes to set up initial conditions. (Ultimately one wants to show for example that there’s a “compiler” that can go from any program, say for a Turing machine, and can generate instructions to set up initial conditions for a three-body problem so that the evolution of the three-body problem will give the same results as running that program—implying that the three-body problem is capable of universal computation.)

I have to say that I consider Newton in a sense very lucky. It could have been that it wouldn’t have been possible to work out anything interesting from his theory without encountering the kind of difficulties he had with the motion of the Moon—because one would always be running into computational irreducibility. But in fact, there was enough computational reducibility and enough that could be computed easily that one could see that the theory was useful in predicting features of the world (and not getting wrong answers, like with the apse of the Moon)—even if there were some parts that might take two centuries to work out, or never be possible at all.

Newton himself was certainly aware of the potential issue, saying that at least if one was dealing with gravitational interactions between many planets then “to define these motions by exact laws admitting of easy calculation exceeds, if I am not mistaken, the force of any human mind”.
And even today it’s extremely difficult to know what the long-term evolution of the solar system will be.

It’s not particularly that there’s sensitive dependence on initial conditions: we actually have measurements that should be precise enough to determine what will happen for a long time. The problem is that we just have to do the computation—a bit like computing the digits of π—to work out the behavior of the -body problem that is our solar system.

Existing simulations show that for perhaps a few tens of millions of years, nothing too dramatic can happen. But after that we don’t know. Planets could change their order. Maybe they could even collide, or be ejected from the solar system. Computational irreducibility implies that at least after an infinite time it’s actually formally undecidable (in the sense of Gödel’s theorem or the halting problem) what can happen.

One of my children, when they were very young, asked me whether when dinosaurs existed the Earth could have had two moons. For years when I ran into celestial mechanics experts I would ask them that question—and it was notable how difficult they found it. Most now say that at least at the time of the dinosaurs we couldn’t have had an extra moon—though a billion years earlier it’s not clear.

We used to only have one system of planets to study. And the fact that there were (then) 9 of them used to be a classic philosopher’s example of a truth about the world that just happens to be the way it is, and isn’t “necessarily true” (like 2 + 2 = 4). But now of course we know about lots of exoplanets. And it’s beginning to look as if there might be a theory for things like how many planets a solar system is likely to have.

At some level there’s presumably a process like natural selection: some configurations of planets aren’t “fit enough” to be stable—and only those that are survive. In biology it’s traditionally been assumed that natural selection and adaptation is somehow what’s led to the complexity we see. But actually I suspect much of it is instead just a reflection of what generally happens in the computational universe—both in biology and in celestial mechanics. Now in celestial mechanics, we haven’t yet seen in the wild any particularly complex forms (beyond a few complicated gap structures in rings, and tumbling moons and asteroids). But perhaps elsewhere we’ll see things like those obviously tangled solutions to the three-body problem—that come closer to what we’re used to in biology.

It’s remarkable how similar the issues are across so many different fields. For example, the whole idea of using “perturbation theory” and series expansions that has existed since the 1700s in celestial mechanics is now also core to quantum field theory. But just like in celestial mechanics there’s trouble with convergence (maybe one should try renormalization or resummation in celestial mechanics). And in the end one begins to realize that there are phenomena—no doubt like turbulence or the three-body problem—that inevitably involve more sophisticated computations, and that need to be studied not with traditional mathematics of the kind that was so successful for Newton and his followers but with the kind of science that comes from exploring the computational universe.

Approaching Modern Times

But let’s get back to the story of the motion of the Moon. Between Brown’s tables, and Poincaré’s theoretical work, by the beginning of the 1900s the general impression was that whatever could reasonably be computed about the motion of the Moon had been computed.

Occasionally there were tests. Like for example in 1925, when there was a total solar eclipse visible in New York City, and the New York Times perhaps overdramatically said that “scientists [were] tense… [wondering] whether they or Moon is wrong as eclipse lags five seconds behind”. The fact is that a prediction accurate to 5 seconds was remarkably good, and we can’t do all that much better even today. (By the way, the actual article talks extensively about “Professor Brown”—as well as about how the eclipse might “disprove Einstein” and corroborate the existence of “coronium”—but doesn’t elaborate on the supposed prediction error.)

As a practical matter, Brown’s tables were not exactly easy to use: to find the position of the Moon from them required lots of mechanical desk calculator work, as well as careful transcription of numbers. And this led Leslie Comrie in 1932 to propose using a punch-card-based IBM Hollerith automatic tabulator—and with the help of Thomas Watson, CEO of IBM, what was probably the first “scientific computing laboratory” was established—to automate computations from Brown’s tables.

(When I was in elementary school in England in the late 1960s—before electronic calculators—I always carried around, along with my slide rule, a little book of “4-figure mathematical tables”. I think I found it odd that such a book would have an author—and perhaps for that reason I still remember the name: “L. J. Comrie”.)

By the 1950s, the calculations in Brown’s tables were slowly being rearranged and improved to make them more suitable for computers. But then with John F. Kennedy’s 1962 “We choose to go to the Moon”, there was suddenly urgent interest in getting the most accurate computations of the Moon’s position. As it turned out, though, it was basically just a tweaked version of Brown’s tables, running on a mainframe computer, that did the computations for the Apollo program.

At first, computers were used in celestial mechanics purely for numerical computation. But by the mid-1960s there were also experiments in using them for algebraic computation, and particularly to automate the generation of series expansions. Wallace Eckert at IBM started using FORMAC to redo Brown’s tables, while in Cambridge David Barton and Steve Bourne (later the creator of the “Bourne shell” (sh) in Unix) built their own CAMAL computer algebra system to try extending the kind of thing Delaunay had done. (And by 1970, Delaunay’s 7th-order calculations had been extended to 20th order.)

When I myself started to work on computer algebra in 1976 (primarily for computations in particle physics), I’d certainly heard about CAMAL, but I didn’t know what it had been used for (beyond vaguely “celestial mechanics”). And as a practicing theoretical physicist in the late 1970s, I have to say that the “problem of the Moon” that had been so prominent in the 1700s and 1800s had by then fallen into complete obscurity.

I remember for example in 1984 asking a certain Martin Gutzwiller, who was talking about quantum chaos, what his main interest actually was. And when he said “the problem of the Moon”, I was floored; I didn’t know there still was any problem with the Moon. As it turns out, in writing this post I found out that Gutzwiller was actually the person who took over from Eckert and spent nearly two decades working on trying to improve the computations of the position of the Moon.

Why Not Just Solve It?

Traditional approaches to the three-body problem come very much from a mathematical way of thinking. But modern computational thinking immediately suggests a different approach. Given the differential equations for the three-body problem, why not just directly solve them? And indeed in the Wolfram Language there’s a built-in function NDSolve for numerically solving systems of differential equations.

So what happens if one just feeds in equations for a three-body problem? Well, here are the equations:

Now as an example let’s set the masses to random values:

And let’s define the initial position and velocity for each body to be random as well:

Now we can just use NDSolve to get the solutions (it gives them as implicit approximate numerical functions of ):

And now we can plot them. And now we’ve got a solution to a three-body problem, just like that!

Well, obviously this is using the Wolfram Language and a huge tower of modern technology. But would it have been possible even right from the beginning for people to generate direct numerical solutions to the three-body problem, rather than doing all that algebra? Back in the 1700s, Euler already knew what’s now called Euler’s method for finding approximate numerical solutions to differential equations. So what if he’d just used that method to calculate the motion of the Moon?

The method relies on taking a sequence of discrete steps in time. And if he’d used, say, a step size of a minute, then he’d have had to take 40,000 steps to get results for a month, but he should have been able to successfully reproduce the position of the Moon to about a percent. If he’d tried to extend to 3 months, however, then he would already have had at least a 10% error.

Any numerical scheme for solving differential equations in practice eventually builds up some kind of error—but the more one knows about the equations one’s solving, and their expected solutions, the more one’s able to preprocess and adapt things to minimize the error. NDSolve has enough automatic adaptivity built into it that it’ll do pretty well for a surprisingly long time on a typical three-body problem. (It helps that the Wolfram Language and NDSolve can handle numbers with arbitrary precision, not just machine precision.)

But if one looks, say, at the total energy of the three-body system—which one can prove from the equations should stay constant—then one will typically see an error slowly build up in it. One can avoid this if one effectively does a change of variables in the equations to “factor out” energy. And one can imagine doing a whole hierarchy of algebraic transformations that in a sense give the numerical scheme as much help as possible.

And indeed since at least the 1980s that’s exactly what’s been done in practical work on the three-body problem, and the Earth-Moon-Sun system. So in effect it’s a mixture of the traditional algebraic approach from the 1700s and 1800s, together with modern numerical computation.

The Real Earth-Moon-Sun Problem

OK, so what’s involved in solving the real problem of the Earth-Moon-Sun system? The standard three-body problem gives a remarkably good approximation to the physics of what’s happening. But it’s obviously not the whole story.

For a start, the Earth isn’t the only planet in the solar system. And if one’s trying to get sufficiently accurate answers, one’s going to have to take into account the gravitational effect of other planets. The most important is Jupiter, and its typical effect on the orbit of the Moon is at about the level—sufficiently large that for example Brown had to take it into account in his tables.

The next effect is that the Earth isn’t just a point mass, or even a precise sphere. Its rotation makes it bulge at the equator, and that affects the orbit of the Moon at the level.

Orbits around the Earth ultimately depend on the full mass distribution and gravitational field of the Earth (which is what Sputnik-1 was nominally launched to map)—and both this, and the reverse effect from the Moon, come in at the level. At the level there are then effects from tidal deformations (“solid tides”) on the Earth and Moon, as well as from gravitational redshift and other general relativistic phenomena.

To predict the position of the Moon as accurately as possible one ultimately has to have at least some model for these various effects.

But there’s a much more immediate issue to deal with: one has to know the initial conditions for the Earth, Sun and Moon, or in other words, one has to know as accurately as possible what their positions and velocities were at some particular time.

And conveniently enough, there’s now a really good way to do that, because Apollo 11, 14 and 15 all left laser retroreflectors on the Moon. And by precisely timing how long it takes a laser pulse from the Earth to round-trip to these retroreflectors, it’s now possible in effect to measure the position of the Moon to millimeter accuracy.

OK, so how do modern analogs of the Babylonian ephemerides actually work? Internally they’re dealing with the equations for all the significant bodies in the solar system. They do symbolic preprocessing to make their numerical work as easy as possible. And then they directly solve the differential equations for the system, appropriately inserting models for things like the mass distribution in the Earth.

They start from particular measured initial conditions, but then they repeatedly insert new measurements, trying to correct the parameters of the model so as to optimally reproduce all the measurements they have. It’s very much like a typical machine learning task—with the training data here being observations of the solar system (and typically fitting just being least squares).

But, OK, so there’s a model one can run to figure out something like the position of the Moon. But one doesn’t want to have to explicitly do that every time one needs to get a result; instead one wants in effect just to store a big table of pre-computed results, and then to do something like interpolation to get any particular result one needs. And indeed that’s how it’s done today.

How It’s Really Done

Back in the 1960s NASA started directly solving differential equations for the motion of planets. The Moon was more difficult to deal with, but by the 1980s that too was being handled in a similar way. Ongoing data from things like the lunar retroreflectors was added, and all available historical data was inserted as well.

The result of all this was the JPL Development Ephemeris (JPL DE). In addition to new observations being used, the underlying system gets updated every few years, for example to get what’s needed for some spacecraft going to some new place in the solar system. (The latest is DE441—that follows DE432, which was built for going to Pluto.)

But so how is the actual ephemeris delivered? Well, for every thousand years covered, the ephemeris has about 100 megabytes of results, given as coefficients for Chebyshev polynomials, which are convenient for interpolation. And for any given quantity in any given coordinate system over a particular period of time, one accesses the appropriate parts of these results.

In Wolfram Language, it’s all packaged up into the function AstroPosition—which here gives the position of the Moon in coordinates relative to the equator of the Earth right now:

OK, but so how does one find an eclipse? Well, it’s an iterative process. Start with an approximation, perhaps from the saros cycle. Then interpolate the ephemeris and look at the result. Then keep iterating until one finds out just when the Moon will be in the appropriate position.

But actually there’s some more to do. Because what’s originally computed are the positions of the barycenters (centers of mass) of the various bodies. But now one has to figure out how the bodies are oriented.

The Earth rotates, and we know its rate quite precisely. But the Moon is basically locked with the same face pointing to the Earth, except that in practice there are small “librations” where the Moon wobbles a little back and forth—and these turn out to be particularly troublesome to predict.

Where Will the Eclipse Be?

OK, so let’s say one knows where the Earth, Moon and Sun are. How does one then figure out what type of eclipse will happen, and where on the Earth the eclipse will actually hit? Well, there’s some further geometry to do. And here’s the beginning of what’s involved:

Basically the Moon generates a cone of shadow, and then the question is how this cone intersects the Earth. If the tip of the cone is inside the Earth, that means there’ll be a region of total shadow (“umbra”) on the Earth—and a total eclipse. (If the tip is above the Earth, there’ll be an annular eclipse, in which there’s a “ring of sun” visible around the Moon.)

By the way, the more complete geometry is like this (again not to scale)

where now we’ve included the penumbra in which only part of the Sun is shadowed by the Moon. In the particular case shown, the umbra cone “misses the Earth”, so there’s no total eclipse, but there’s still a partial eclipse where part of the Sun is shadowed.

OK, but let’s say there’s going to be a total eclipse. To see where on the Earth the region of totality will be, we have to work out where the “cone of total shadow” (i.e. umbra) will intersect the surface of the Earth. It’s a somewhat complicated 3D geometry problem:

It’s easiest to understand what happens by looking at things from the position of the Sun. The light gray region is the penumbra, and the little black dot is the region of totality (i.e. the umbra):

As the Earth and Moon move in their orbits, the region of shadow will move relative to the Earth:

But now there’s another part of the story, which is the rotation of the Earth. And if we include that, we’ll see that the region of totality (at least in this case) traces out a kind of S-shaped curve on the surface of the Earth:

All this tricky geometry got figured out in 1824 by Friedrich Bessel, who introduced what are now called the Besselian elements—eight variables that specify the location, orientation and aperture of the umbra and penumbra cones, as well as the orientation of the Earth, as a function of time. And for any given eclipse, the whole story of its appearance and trajectory on the surface of the Earth is determined by its Besselian elements.

When Will the Eclipse Arrive?

OK, so now we know what the trajectory of an eclipse will be. But how do we figure out at what time the eclipse will actually reach a given point on Earth? Well, first we have to be clear on our definition of time. And there’s an immediate issue with the speed of light and special relativity. What does it mean to say that the positions of the Earth and Sun are such-and-such at such-and-such a time? Because it takes light about 8 minutes to get to the Earth from the Sun, we only get to see where the Sun was 8 minutes ago, not where it is now.

And what we need is really a classic special relativity setup. We essentially imagine that the solar system is filled with a grid of clocks that have been synchronized by light pulses. And what a modern ephemeris does is to quote the results for positions of bodies in the solar system relative to the times on those clocks. (General relativity implies that in different gravitational fields the clocks will run at different rates, but for our purposes this is a tiny effect. But what isn’t a tiny effect is including retardation in the equations for the -body problem—making them become delay differential equations.)

But now there’s another issue. If one’s observing the eclipse, one’s going to be using some timepiece (phone?) to figure out what time it is. And if it’s working properly that timepiece should show official “civil time” that’s based on UTC—which is what NTP internet time is synchronized to. But the issue is that UTC has a complicated relationship to the time used in the astronomical ephemeris.

The starting point is what’s called UT1: a definition of time in which one day is the average time it takes the Earth to rotate once relative to the Sun. But the point is that this average time isn’t constant, because the rotation of the Earth is gradually slowing down, primarily as a result of interactions with the Moon. But meanwhile, UTC is defined by an atomic clock whose timekeeping is independent of any issues about the rotation of the Earth.

There’s a convention for keeping UT1 aligned with UTC: if UT1 is going to get more than 0.9 seconds away from UTC, then a leap second is added to UTC. One might think this would be a tiny effect, but actually, since 1972, a total of 27 leap seconds have been added (as specified in the Wolfram Language by GeoOrientationData):

Exactly when a new leap second will be needed is unpredictable; it depends on things like what earthquakes have occurred. But we need to account for leap seconds if we’re going to get the time of the eclipse correct to the second relative to UTC or internet time.

There are a few other effects that are also important in the precise observed timing of the eclipse. The most obvious is geo elevation. In doing astronomical computations, the Earth is assumed to be an ellipsoid. (There are many different definitions, corresponding to different geodetic “datums”—and that’s an issue in defining things like “sea level”, but it’s not relevant here.) But if you’re at a different height above the ellipsoid, the cone of shadow from the eclipse will reach you at a different time. And the size of this effect can be as much as 0.3 seconds for every 1000 feet of height.

All of the effects we’ve talked about we’re readily able to account for. But there is one remaining effect that’s a bit more difficult. Right at the beginning or end of totality one typically sees points of light on the rim of the Moon. Known as Baily’s beads, these are the result of rays of light that make it to us between mountains on the Moon. Figuring out exactly when all these rays are extinguished requires taking geo elevation data for the Moon, and effectively doing full 3D ray tracing. And in doing this, one ends with the rather bizarre conclusion that the region of shadow on the earth isn’t a perfect circle; instead it’s roughly a polygon, each of whose edges is associated with a particular Baily’s bead. The effect of this can last as long as a second, and can cause the precise edge of totality to move by as much as a mile. (One can also imagine effects having to do with the corona of the Sun, which is constantly changing.)

But in the end, even though the shadow of the Moon on the Earth moves at more than 1000 mph, modern science successfully makes it possible to compute when the shadow will reach a particular point on Earth to an accuracy of perhaps a second. And that’s what our precisioneclipse.com website is set up to do.

Eclipse Experiences

Written August 15, 2017

I saw my first partial solar eclipse more than 50 years ago. And I’ve seen one total solar eclipse before in my life—in 1991. It was the longest eclipse (6 minutes 53 seconds) that’ll happen for more than a century.

There was a certain irony to my experience, though, especially in view of our efforts now to predict the exact arrival time of next week’s eclipse. I’d chartered a plane and flown to a small airport in Mexico (yes, that’s me on the left with the silly hat)—and my friends and I had walked to a beautiful deserted beach, and were waiting under a cloudless sky for the total eclipse to begin.

I felt proud of how prepared I was—with maps marking to the minute when the eclipse should arrive. But then I realized: there we were, out on a beach with no obvious signs of modern civilization—and nobody had brought any properly set timekeeping device (and in those days my cellphone was just a phone, and didn’t even have signal there).

And so it was that I missed seeing a demonstration of an impressive achievement of science. And instead I got to experience the eclipse pretty much the way people throughout history have experienced eclipses—even if I did know that the Moon would continue gradually eating into the Sun and eventually cover it, and that it wouldn’t make the world end.

There’s always something sobering about astronomical events, and about realizing just how tiny human scales are compared to them. Billions of eclipses have happened over the course of the Earth’s history. Recorded history has covered only a few thousand of them. On average, there’s an eclipse at any given place on Earth roughly every 400 years; in Jackson, WY, where I’m planning to see next week’s eclipse, it turns out the next total eclipse will be 727 years from now—in 2744.

In earlier times, civilizations built giant monuments to celebrate the motions of the Sun and Moon. Today, for the eclipse next week, what we’re making is a website. But that website builds on one of the great epics of human intellectual history—stretching back to the earliest times of systematic science, and encompassing contributions from a remarkable cross-section of the most celebrated scientists and mathematicians from past centuries.

It’ll be about 9538 days since the eclipse I saw in 1991. The Moon will have traveled some 500 million miles around the Earth, and the Earth some 15 billion miles around the Sun. But now—in a remarkable triumph of science—we’re computing to the second when they’ll be lined up again.

Written in Anticipation of April 8, 2024

In the days leading up to August 21, 2017, millions of people accessed our precisioneclipse.com website—with their geoIPs increasingly concentrating near the path of totality. I had traveled to Wyoming and—with a couple of hours to spare—found a place with a clear view across a valley. And this now being 2017, I tweeted:

But would our carefully done computations actually be accurate? The eclipse was going to make landfall on the Oregon coast, and, conveniently, we had a spotter right there. From where I was, the partial eclipse was well underway. And then I got a text: yes, totality in Oregon had come at the predicted second. It would take the shadow of the Moon a little under 20 minutes to reach me.

Unlike in 1991, I and everyone else had a cellphone with a precise clock—and was able to access our precisioneclipse.com site:

While I was waiting I was making little images of the crescent Sun with my fingers—repeating something I’d first noticed more than 50 years earlier when I saw my first partial eclipse at the age of 6:

A few minutes before the eclipse, I started to see a strange shimmering (invisible on any video I took): shadow bands, a strange and poorly understood eclipse phenomenon. And then, there it came, sweeping across the valley: totality. Arriving right at the predicted second:

I had set up a camera to capture a video of the eclipse, and a little later that day I did an analysis of it—and, since earlier in 2017 I’d started routinely doing livestreams, I livestreamed it:

At the end, I published my notebook to the Wolfram Cloud, and it’s still there:

And now six years have passed. Much has happened in our human world. But the Moon has just inexorably continued in its orbit. And 2422 days later, it will once again line up to create a total eclipse…

Computing the Eclipse: Astronomy in the Wolfram Language

Mark Long — Fri, 29 Mar 2024 18:30:58 +0000

Basic Eclipse Computation

It’s taken millennia to get to the point where it’s possible to accurately compute eclipses. But now—as a tiny part of making “everything in the world” computable—computation about eclipses is just a built-in feature of the Wolfram Language.

The core function is SolarEclipse. By default, SolarEclipse tells us the time of the next solar eclipse from now:

It can also tell us the next solar eclipse from any time more than 10,000 years in the past and future:

By default, SolarEclipse tells us about all solar eclipses, including partial ones. But we can request only total eclipses (or annular ones, etc.):

SolarEclipse immediately lets you compute nearly 100 properties of an eclipse. The most basic property is the type of eclipse; this tells us that the next eclipse from now will be total:

And here are the types of all eclipses for the rest of the decade:

This gives a timeline of these eclipses; notice that most eclipses are separated by 5 or 6 months, but there’s one pair (in 2029) that is just a month apart:

Another basic property of an eclipse is its magnitude: how much of the diameter of the disk of the Sun is covered by the Moon. The next eclipse is total, so the magnitude is greater than 1:

Looking at the sequence of eclipses for the remainder of the decade, we see which ones are total and which are not:

Here’s the path of totality for the upcoming total eclipse:

And here’s where the eclipse is partial:

This is easier to understand with a different geo projection:

There are all sorts of special points and lines associated with these regions. This gives the “point of maximum eclipse” (i.e. essentially the place where the eclipse lasts longest):

And this gives the precise time (converted to my current time zone) of the maximum eclipse:

This gives the time of maximum eclipse in the time zone of the point of maximum eclipse:

At the point of maximum eclipse, this gives the duration of the umbra (i.e. the time of totality):

And here’s a map of where on the Earth the umbra is at the time of maximum eclipse:

Zooming out to a range of 500 miles makes it easier to tell where this is:

This shows the position of the umbra every minute for the hour after the time of maximum eclipse:

Here’s a summary map of the eclipse, including for example (in red) the points of first and last contact of the penumbra:

Instead of looking at what amounts to the shadow of the Moon on the Earth, we can ask what we’d see in the sky. During a total eclipse the Moon will completely cover the Sun. But here’s what happens just 15 minutes before the time of maximum eclipse:

Patterns of Eclipses

Here’s a map of the path of totality for all the total eclipses for the next 50 years:

Over the course of 500 years there are lots of total eclipses:

Although it’s not terribly obvious there, there’s actually a lot of regularity in these paths. In particular, as we discussed previously, similar eclipses occur in “saros series”, separated by a time of about 1 saros, or roughly 18 years. Here are the paths of eclipses that appear for the 10 saroses after the next eclipse:

Each successive eclipse in the saros series is systematically about 120° to the west of the previous one, and a little south (or north, depending on the series). The series continues like this until the eclipse paths hit one of the poles, at which point the series ends:

Any given eclipse is in a saros series. The next eclipse is in series 139 (the numbering scheme for these series was set in 1955—with series 0 chosen, quite arbitrarily, to be the one that starts just after 3000 BC):

There are 71 eclipses in this saros series

running from the year 1501 to the year 2763 (a span of 1263 years):

Not all these eclipses are total, however. But if we plot the magnitudes of all the eclipses, we see that the partial ones appear only at the ends of the saros series:

If we look at the next few eclipses, we’ll see that they are part of all sorts of different saros series:

Here are dates for eclipses in a sequence of different saros series:

Right now there are 40 saros series active:

Any given eclipse could be specified by its “index number” within its saros series. But in the mid-20th century it was realized that there’s a more convenient and robust way to label eclipses, using a combination of “saros number” and what’s called “inex number”.

As we discussed previously, for an eclipse to occur, a new moon must happen at a time when the Moon is close to the plane of the ecliptic. The average time between new moons is the so-called synodic month:

Meanwhile, the average time between the Moon’s crossings of the plane of the ecliptic is half the so-called draconic month:

Given a particular eclipse, the time before an approximate “repeat eclipse” will correspond to a coincidence between an integer multiple of the synodic month, and of half the draconic month. And to figure out when these coincidences will occur is essentially a question of number theory.

Let’s compute the continued fraction expansion:

From this we can derive a sequence of rational approximations:

These approximations get progressively better:

The 8th one is 484/223—and this corresponds to the saros cycle, which reflects the close similarity of 223 synodic months and 242 draconic months:

But now let’s look at the 9th rational approximation: 777/358. This reflects the coincidence:

And now this coincidence defines another cycle—which is the inex cycle. There are lots of other cycles one can identify—but all the common ones can be expressed as linear combinations of the saros and inex cycles.

We saw above how eclipses occur in saros series. But we now see that they will also occur in inex series. And a convenient way to specify an eclipse is to say what saros series and what inex series it appears in. The saros and inex series are numbered according to when they started, with the 0th saros series by convention being the one that spans:

With this setup, the April 2024 eclipse can then be specified by its saros and inex numbers:

But how does this fit in with other eclipses? Here’s a plot of the inex and saros numbers of all eclipses between 1000 AD and 3000 AD (with the April 2024 eclipse indicated in red):

Each saros series shows up as a vertical line of eclipses, and each inex series as a horizontal line. The finite overall date range for the picture leads to the diagonal cutoffs on each side.

For each saros, inex number pair there’ll be some kind of eclipse. But most of the eclipses won’t be total. Here’s where total eclipses occur in the saros, inex plane:

And here’s a complete color-coded “panorama” of different kinds of eclipses:

(Note that—as we discussed above—the eclipses at the ends of saros series are partial.)

By the way, when we’re discussing eclipses in the future, there’s a subtlety to mention. By computing the overall motion of the Earth and Moon we can work out when there’ll be an eclipse, and where in space the cone of shadow associated with it will be. But it’s a different question where geographically that shadow will land on the Earth because that depends on the rotation of the Earth—which isn’t precisely predictable, as illustrated by the somewhat haphazard way in which leap seconds have had to be added to align “universal” UTC time with time based on the rotation of the Earth and the length of a day. Is it a large effect? Over the past 500 years the discrepancy has been about 176 seconds; over the next 500 years it could easily be as much as 1000 seconds. Meanwhile, for an eclipse on the equator each second corresponds to a change of 0.29 miles in where the eclipse will be total (the change is less farther from the equator). In what we’ve shown here we’ve made some standard assumptions about how the rotation of the Earth is slowing down—but ultimately this isn’t predictable, making the locations of eclipses 500 years from now uncertain in the east-west direction by as much as several hundred miles.

The Eclipse from First Principles

The function SolarEclipse in the Wolfram Language immediately tells us when eclipses occur. But we can also deduce this information from other “lower-level” functions. In particular, we know that an eclipse (of at least some kind) occurs if the angular separation between the Sun and the Moon in the sky is smaller than the total of their apparent radii (or about 0.5°). The angular separation depends on where you are on the Earth. Let’s pick a location from which we know an eclipse will be visible:

Now let’s plot angular separation for each hour over the course of the next year:

The minima occur once per lunar month, near the time of the new moons. Changing the plot range, we can see that these minima are different in different (lunar) months:

Let’s look at early April in more detail, now sampling every minute:

And what we see here is that the angular separation goes to zero—reflecting the fact that there’s a total eclipse. What are those little glitches? They’re a consequence of the fact that the apparent positions of the Sun and Moon change when they’re close to the horizon because of refraction in the atmosphere. Not including refraction gives a smoother curve:

We’ve been looking at the angular separation between the centers of the disks of the Sun and Moon in the sky. But what about the actual positions in the sky? Here’s the astro position of the Sun (in local horizon coordinates) at the time and place of maximum eclipse:

And here now is the almost-exactly-equal result for the Moon:

What will it actually look like in the sky? Here’s now a graphic showing the disks of the Sun and Moon 15 minutes before the time of maximum eclipse:

Here’s a sequence of images 15 minutes apart:

Analyzing Data from the 2017 Eclipse

I saw the 2017 eclipse from a rather scenic spot near Jackson, Wyoming:

Specifically (according to Wolfram|Alpha accessed through my phone) I was at geo position 43.5125°N 110.6506°W—at an elevation of 7526 ft:

As a check, here’s the expected (ground) elevation at that lat-lon—definitely within bounds, particularly considering I was holding the phone about 5 feet off the ground, etc.:

Defining my location as

the predicted arrival time of the eclipse at that location was then:

And indeed that’s exactly what our precisioneclipse.com site told me at the time:

What would happen at the “moment of totality”? Here’s where the shadow of the Moon was predicted to be (where I was standing is indicated by a red dot; the whole picture is 100 miles across):

Fifteen seconds earlier, the shadow of the Moon would have just “crossed” the row of mountains I could see:

Looking (in exaggerated 3D) at the terrain, here’s the “umbral cone” at the moment of totality for me:

And it was moving at a speed

which is effectively a vector difference of the rotation speed of the Earth

corrected for latitude

and the orbital velocity of the Moon:

Shown at 5-second intervals for 30 seconds, here’s how the edge of the umbra was moving just before it reached me:

That day, I had brought some not-very-high-tech “equipment” to record the eclipse:

The main video I captured—which was from the iPad—is now stored for posterity in the Wolfram Data Repository:

It was 11 minutes long. Sampling frames from it we get:

To get more of a sense of the eclipse, we can pick out the center column of the video, and arrange it in time:

Notice that the “band of totality” appears slightly further to the left at the top of the image—reflecting the fact that totality reaches the mountains in the distance (at the top of the image) slightly earlier than it reaches the “foreground” at the bottom of the image.

To be more quantitative, we can measure the mean intensity of the bottom part of the image as a function of time (where here we’ve aligned with the timestamp that records the start time of the video):

We see in this “light curve” a big dip during the period of totality. There’s also a little dip earlier from someone walking in front of the camera. And we also see lots of little glitches that we’ll discuss later.

But what should we expect this light curve to look like? Well, we can predict what the obscuration of the Sun around the time of totality should be—and but for the 20% effect of “limb darkening”, this should give the intensity of sunlight:

Rescaling values somewhat, we can compare the curves:

And zooming in on the minute around the beginning of totality, we see:

And, yes indeed, the observed onset of totality seems to agree basically to the second with our prediction!

But even though there’s this agreement, the overall shapes of the observed and predicted light curves definitely aren’t the same. And, yes, this is a story of experimental science. And, arguably, of a mistake I made—of using consumer electronics, optimized for consumer purposes, to make a quantitative scientific measurement. You see, as I now realize, an iPad by default always tries to “get a good picture” by maintaining the brightness of an image independent of overall light level. And while for “consumer purposes” that’s usually the right thing to do, it definitely confuses things if one’s trying to measure the light curve for an eclipse.

And indeed if we look at our “measured light curve” it’s very flat until the period of totality. In other words, the iPad succeeded in maintaining the same image brightness until there just wasn’t enough light at all, at which point the image “faded to black”. (At the end of totality, the iPad gradually realized “yes, there’s more light now”, and the measured light curve slowly climbs back up.)

But what’s with all the glitches we see? They’re already visible in our “video time collage” above. And one thought might be that they’re an actual eclipse phenomenon—perhaps associated with the “shadow bands” that I did indeed see just before this eclipse. But the characteristic shimmering associated with such shadow bands—while very difficult to capture on video, perhaps because actual images aren’t being formed—happens much faster than the glitches we’re seeing.

Looking in a bit more detail, we see that there are upward glitches in the period before totality, and downward ones after. Zooming in on a couple of minutes of the “before” period and a couple of minutes of the “after” period, we see:

And, yes, we can validate that this is an “instrumental phenomenon” by doing a simple experiment—using the very same iPad as in 2017—and continuously sliding a piece of cardboard in front of a light and then capturing video and measuring the light curve for this in-my-basement-style “model eclipse”:

We see the very same kind of glitches as from the actual eclipse video. And presumably in both cases they’re reflections of the automatic exposure control system used by the iPad. (If the individual frames of the video had EXIF metadata, we might be able to see that explicitly, but as it is, there’s only EXIF data for the whole video.) I don’t know in detail how this iPad’s exposure control system works, but what we’re seeing is a kind of overshooting-and-correction that’s very common in all sorts of control systems. If we knew in advance everything that happens in the video, then maybe we could avoid the glitches. But if we’re going to try to maintain light level on an ongoing basis during the recording of the video (perhaps by adjusting the gamma correction that determines how raw sensor values are translated to pixel values), then control theory most likely implies that glitches are inevitable.

My First Eclipse

So many years later, I still remember it well. I was 6 years old (almost 7), walking the couple of blocks to school (yes, on my own, which kids in England in those days did). It happened when I was walking under a tree (and, yes, I still remember exactly where). There were lots of dappled patches of light on the ground. And something looked strange about them. And suddenly I realized what it was—and I still have an image of it in my mind today. All the patches of light had the same bite taken out of them. And it didn’t take me long to look up at the Sun, and see that it too had a bite taken out of it.

Being already something of a science enthusiast, I’d heard of eclipses, and realized this must be one. I arrived at school a few minutes later, and regaled the other kids with my “discovery”. But despite the obviousness (at least to me) of what was going on, I wasn’t widely believed. And, yes, that was a very educational experience. But that’s a different story…

So what was that eclipse? Well, it was the one in May 1966:

It was a partial eclipse—visible from England:

The geo location of my “discovery tree” was 51.7636° N 1.2558° W. So now we can compute the magnitude of the eclipse there that morning:

I believe school started at 9am, so what I saw was an eclipse with rough magnitude:

The Sun (and Moon) were about 50° above the horizon:

And the Moon was poking into the disk of the Sun:

Thirty minutes later the Moon had poked a little further into the disk of the Sun. But after a bit more than a hour, the whole Sun was back, and the eclipse was over:

And it would be 25 years before I’d see another eclipse—though this time a total one.

By the way, this is me back at the time of my first eclipse—captured in a long-before-those-were-popular “selfie”, taken with a film camera and manual focus, and, it seems, a lot of concentration (and, yes, I think I still make the same strange expression when I’m concentrating hard today):

Can AI Solve Science?

Stephen Wolfram — Tue, 05 Mar 2024 22:21:12 +0000

Note: Click any diagram to get Wolfram Language code to reproduce it. Wolfram Language code for training the neural nets used here is also available (requires GPU).

Won’t AI Eventually Be Able to Do Everything?

Particularly given its recent surprise successes, there’s a somewhat widespread belief that eventually AI will be able to “do everything”, or at least everything we currently do. So what about science? Over the centuries we humans have made incremental progress, gradually building up what’s now essentially the single largest intellectual edifice of our civilization. But despite all our efforts, there are still all sorts of scientific questions that remain. So can AI now come in and just solve all of them?

To this ultimate question we’re going to see that the answer is inevitably and firmly no. But that certainly doesn’t mean AI can’t importantly help the progress of science. At a very practical level, for example, LLMs provide a new kind of linguistic interface to the computational capabilities that we’ve spent so long building in the Wolfram Language. And through their knowledge of “conventional scientific wisdom” LLMs can often provide what amounts to very high-level “autocomplete” for filling in “conventional answers” or “conventional next steps” in scientific work.

But what I want to do here is to discuss what amount to deeper questions about AI in science. Three centuries ago science was transformed by the idea of representing the world using mathematics. And in our times we’re in the middle of a major transformation to a fundamentally computational representation of the world (and, yes, that’s what our Wolfram Language computational language is all about). So how does AI stack up? Should we think of it essentially as a practical tool for accessing existing methods, or does it provide something fundamentally new for science?

My goal here is to explore and assess what AI can and can’t be expected to do in science. I’m going to consider a number of specific examples, simplified to bring out the essence of what is (or isn’t) going on. I’m going to talk about intuition and expectations based on what we’ve seen so far. And I’m going to discuss some of the theoretical—and in some ways philosophical—underpinnings of what’s possible and what’s not.

So what do I actually even mean by “AI” here? In the past, anything seriously computational was often considered “AI”, in which case, for example, what we’ve done for so long with our Wolfram Language computational language would qualify—as would all my “ruliological” study of simple programs in the computational universe. But here for the most part I’m going to adopt a narrower definition—and say that AI is something based on machine learning (and usually implemented with neural networks), that’s been incrementally trained from examples it’s been given. Often I’ll add another piece as well: that those examples include either a large corpus of human-generated scientific text, etc., or a corpus of actual experience about things that happen in the world—or, in other words, that in addition to being a “raw learning machine” the AI is something that’s already learned from lots of human-aligned knowledge.

OK, so we’ve said what we mean by AI. So now what do we mean by science, and by “doing science”? Ultimately it’s all about taking things that are “out there in the world” (and usually the natural world) and having ways to connect or translate them to things we can think or reason about. But there are several, rather different, common “workflows” for actually doing science. Some center around prediction: given observed behavior, predict what will happen; find a model that we can explicitly state that says how a system will behave; given an existing theory, determine its specific implications. Other workflows are more about explanation: given a behavior, produce a human-understandable narrative for it; find analogies between different systems or models. And still other workflows are more about creating things: discover something that has particular properties; discover something “interesting”.

In what follows we’ll explore these workflows in more detail, seeing how they can (or cannot) be transformed—or informed—by AI. But before we get into this, we need to discuss something that looms over any attempt to “solve science”: the phenomenon of computational irreducibility.

The Hard Limit of Computational Irreducibility

Often in doing science there’s a big challenge in finding the underlying rules by which some system operates. But let’s say we’ve found those rules, and we’ve got some formal way to represent them, say as a program. Then there’s still a question of what those rules imply for the actual behavior of the system. Yes, we can explicitly apply the rules step by step and trace what happens. But can we—in one fell swoop—just “solve everything” and know how the system will behave?

To do that, we in a sense have to be “infinitely smarter” than the system. The system has to go through all those steps—but somehow we can “jump ahead” and immediately figure out the outcome. A key idea—ultimately supported at a foundational level by our Physics Project—is that we can think of everything that happens as a computational process. The system is doing a computation to determine its behavior. We humans—or, for that matter, any AIs we create—also have to do computations to try to predict or “solve” that behavior. But the Principle of Computational Equivalence says that these computations are all at most equivalent in their sophistication. And this means that we can’t expect to systematically “jump ahead” and predict or “solve” the system; it inevitably takes a certain irreducible amount of computational work to figure out what exactly the system will do. And so, try as we might, with AI or otherwise, we’ll ultimately be limited in our “scientific power” by the computational irreducibility of the behavior.

But given computational irreducibility, why is science actually possible at all? The key fact is that whenever there’s overall computational irreducibility, there are also an infinite number of pockets of computational reducibility. In other words, there are always certain aspects of a system about which things can be said using limited computational effort. And these are what we typically concentrate on in “doing science”.

But inevitably there are limits to this—and issues that run into computational irreducibility. Sometimes these manifest as questions we just can’t answer, and sometimes as “surprises” we couldn’t see coming. But the point is that if we want to “solve everything” we’ll inevitably be confronted with computational irreducibility, and there just won’t be any way—with AI or otherwise—to shortcut just simulating the system step by step.

There is, however, a subtlety here. What if all we ever want to know about are things that align with computational reducibility? A lot of science—and technology—has been constructed specifically around computationally reducible phenomena. And that’s for example why things like mathematical formulas have been able to be as successful in science as they have.

But we certainly know we haven’t yet solved everything we want in science. And in many cases it seems like we don’t really have a choice about what we need to study; nature, for example, forces it upon us. And the result is that we inevitably end up face-to-face with computational irreducibility.

As we’ll discuss, AI has the potential to give us streamlined ways to find certain kinds of pockets of computational reducibility. But there’ll always be computational irreducibility around, leading to unexpected “surprises” and things we just can’t quickly or “narratively” get to. Will this ever end? No. There’ll always be “more to discover”. Things that need more computation to reach. Pockets of computational reducibility that we didn’t know were there. And ultimately—AI or not—computational irreducibility is what will prevent us from ever being able to completely “solve science”.

There’s a curious historical resonance to all this. Back at the beginning of the twentieth century, there was a big question of whether all of mathematics could be “mechanically solved”. The arrival of Gödel’s theorem, however, seemed to establish that it could not. And now that we know that science also ultimately has a computational structure, the phenomenon of computational irreducibility—which is, in effect, a sharpening of Gödel’s theorem—shows that it too cannot be “mechanically solved”.

We can still ask, though, whether the mathematics—or science—that humans choose to study might manage to live solely in pockets of computational reducibility. But in a sense the ultimate reason that “math is hard” is that we’re constantly seeing evidence of computational irreducibility: we can’t get around actually having to compute things. Which is, for example, not what methods like neural net AI (at least without the help of tools like Wolfram Language) are good at.

Things That Have Worked in the Past

Before getting into the details of what modern machine-learning-based AI might be able to do in “solving science”, it seems worthwhile to recall some of what’s worked in the past—not least as a kind of baseline for what modern AI might now be able to add.

I myself have been using computers and computation to discover things in science for more than four decades now. My first big success came in 1981 when I decided to try enumerating all possible rules of a certain kind (elementary cellular automata) and then ran them on a computer to see what they did:

I’d assumed that with simple underlying rules, the final behavior would be correspondingly simple. But in a sense the computer didn’t assume that: it just enumerated rules and computed results. And so even though I never imagined it would be there, it was able to “discover” something like rule 30.

Over and over again I have had similar experiences: I can’t see how some system can manage to do anything “interesting”. But when I systematically enumerate possibilities, there it is: something unexpected, interesting—and “clever”—effectively discovered by computer.

In the early 1990s I wondered what the simplest possible universal Turing machine might be. I would never have been able to figure it out myself. The machine that had held the record since the early 1960s had 7 states and 4 colors. But the computer let me discover just by systematic enumeration the 2-state, 3-color machine

that in 2007 was proved universal (and, yes, it’s the simplest possible universal Turing machine).

In 2000 I was interested in what the simplest possible axiom system for logic (Boolean algebra) might be. The simplest known up to that time involved 9 binary (Nand) operations. But by systematically enumerating possibilities, I ended up finding the single 6-operation axiom (which I proved correct using automated theorem proving). Once again, I had no idea this was “out there”, and certainly I would never have been able to construct it myself. But just by systematic enumeration the computer was able to find what seemed to me like a very “creative” result.

In 2019 I was doing another systematic enumeration, now of possible hypergraph rewriting rules that might correspond to the lowest-level structure of our physical universe. When I looked at the geometries that were generated I felt like as a human I could roughly classify what I saw. But were there outliers? I turned to something closer to “modern AI” to do the science—making a feature space plot of visual images:

It needed me as a human to interpret it, but, yes, there were outliers that had effectively been “automatically discovered” by the neural net that was making the feature space plot.

I’ll give one more example—of a rather different kind—from my personal experience. Back in 1987—as part of building Version 1.0 of what’s now Wolfram Language—we were trying to develop algorithms to compute hundreds of mathematical special functions over very broad ranges of arguments. In the past, people had painstakingly computed series approximations for specific cases. But our approach was to use what amounts to machine learning, burning months of computer time fitting parameters in rational approximations. Nowadays we might do something similar with neural nets rather than rational approximations. But in both cases the concept is to find a general model of the “world” one’s dealing with (here, values of special functions)—and try to learn the parameters in the model from actual data. It’s not exactly “solving science”, and it wouldn’t even allow one to “discover the unexpected”. But it’s a place where “AI-like” knowledge of general expectations about smoothness or simplicity lets one construct the analog of a scientific model.

Can AI Predict What Will Happen?

It’s not the only role of science—and in the sections that follow we’ll explore others. But historically what’s often been viewed as a defining feature of successful science is: can it predict what will happen? So now we can ask: does AI give us a dramatically better way to do this?

In the simplest case we basically want to use AI to do inductive inference. We feed in the results of a bunch of measurements, then ask the AI to predict the results of measurements we haven’t yet done. At this level, we’re treating the AI as a black box; it doesn’t matter what’s happening inside; all we care about is whether the AI gives us the right answer. We might think that somehow we can set up the AI up so that it “isn’t making any assumptions”—and is just “following the data”. But it’s inevitable that there’ll be some underlying structure in the AI, that makes it ultimately assume some kind of model for the data.

Yes, there can be a lot of flexibility in this model. But one can’t have a truly “model-less model”. Perhaps the AI is based on a huge neural network, with billions of numerical parameters that can get tweaked. Perhaps even the architecture of the network can change. But the whole neural net setup inevitably defines an ultimate underlying model.

Let’s look at a very simple case. Let’s imagine our “data” is the blue curve here—perhaps representing the motion of a weight suspended on a spring—and that the “physics” tells us it continues with the red curve:

Now let’s take a very simple neural net

and let’s train it using the “blue curve” data above to get a network with a certain collection of weights:

Now let’s apply this trained network to reproduce our original data and extend it:

And what we see is that the network does a decent job of reproducing the data it was trained on, but when it comes to “predicting the future” it basically fails.

So what’s going on here? Did we just not train long enough? Here’s what happens with progressively more rounds of training:

It doesn’t seem like this helps much. So maybe the problem is that our network is too small. Here’s what happens with networks having a series of sizes:

And, yes, larger sizes help. But they don’t solve the problem of making our prediction successful. So what else can we do? Well, one feature of the network is its activation function: how we determine the output at each node from the weighted sum of inputs. Here are some results with various (popular) activation functions:

And there’s something notable here—that highlights the idea that there are “no model-less models”: different activation functions lead to different predictions, and the form of the predictions seems to be a direct reflection of the form of the activation function. And indeed there’s no magic here; it’s just that the neural net corresponds to a function whose core elements are activation functions.

So, for example, the network

corresponds to the function

where ϕ represents the activation function used in this case.

Of course, the idea of approximating one function by some combination of standard functions is extremely old (think: epicycles and before). Neural nets allow one to use more complicated (and hierarchical) combinations of more complicated and nonlinear functions, and provide a more streamlined way of “fitting all the parameters” that are involved. But at a fundamental level it’s the same idea.

And for example here are some approximations to our “data” constructed in terms of more straightforward mathematical functions:

These have the advantage that it’s quite easy to state “what each model is” just by “giving its formula”. But just as with our neural nets, there are problems in making predictions.

(By the way, there are a whole range of methods for things like time series prediction, involving ideas like “fitting to recurrence relations”—and, in modern times, using transformer neural nets. And while some of these methods happen to be able to capture a periodic signal like a sine wave well, one doesn’t expect them to be broadly successful in accurately predicting functions.)

OK, one might say, perhaps we’re trying to use—and train—our neural nets in too narrow a way. After all, it seems as if it was critical to the success of ChatGPT to have a large amount of training data about all kinds of things, not just some narrow specific area. Presumably, though, what that broad training data did was to let ChatGPT learn the “general patterns of language and common sense”, which it just wouldn’t be able to pick up from narrower training data.

So what’s the analog for us here? It might be that we’d want our neural net to have a “general idea of how functions work”—for example to know about things like continuity of functions, or, for that matter, periodicity or symmetry. So, yes, we can go ahead and train not just on a specific “window” of data like we did above, but on whole families of functions—say collections of trigonometric functions, or perhaps all the built-in mathematical functions in the Wolfram Language.

And, needless to say, if we do this, we’ll surely be able to successfully predict our sine curve above—just as we would if we were using traditional Fourier analysis with sine curves as our basis. But is this “doing science”?

In essence it’s saying, “I’ve seen something like this before, so I figure this is what’s going to happen now”. And there’s no question that can be useful; indeed it’s an automated version of a typical thing that a human experienced in some particular area will be able to do. We’ll return to this later. But for now the main point is that at least when it comes to things like predicting functions, it doesn’t seem as if neural nets—and today’s AIs—can in any obvious way “see further” than what goes into their construction and training. There’s no “emergent science”; it’s just fairly direct “pattern matching”.

Predicting Computational Processes

Predicting a function is a particularly austere task and one might imagine that “real processes”—for example in nature—would have more “ambient structure” which an AI could use to get a “foothold” for prediction. And as an example of what we might think of as “artificial nature” we can consider computational systems like cellular automata. Here’s an example of what a particular cellular automaton rule does, with a particular initial condition:

There’s a mixture here of simplicity and complexity. And as humans we can readily predict what’s going to happen in the simple parts, but basically can’t say much about the other parts. So how would an AI do?

Clearly if our “AI” can just run the cellular automaton rule then it will be able to predict everything, though with great computational effort. But the real question is whether an AI can shortcut things to make successful predictions without doing all that computational work—or, put another way, whether the AI can successfully find and exploit pockets of computational reducibility.

So, as a specific experiment, let’s set up a neural net to try to efficiently predict the behavior of our cellular automaton. Our network is basically a straightforward—though “modern”—convolutional autoencoder, with 59 layers and a total of about 800,000 parameters:

It’s trained much like an LLM. We got lots of examples of the evolution of our cellular automaton, then we showed the network the “top half” of each one, and tried to get it to successfully continue this, to predict the “bottom half”. In the specific experiment we did, we gave 32 million examples of 64-cell-wide cellular automaton evolution. (And, yes, this number of examples is tiny compared to all possible initial configurations.) Then we tried feeding in “chunks” of cellular automaton evolution 64 cells wide and 64 steps long—and looked to see what probabilities the network assigned to different possible continuations.

Here are some results for a sequence of different initial conditions:

And what we see is what we might expect: when the behavior is simple enough, the network basically gets it right. But when the behavior is more complicated, the network usually doesn’t do so well with it. It still often gets it at least “vaguely right”—but the details aren’t there.

Perhaps, one might think, the network just wasn’t trained for long enough, or with enough examples. And to get some sense of the effect of more training, here’s how the predicted probabilities evolve with successive quarter million rounds of training:

These should be compared to the exact result:

And, yes, with more training there is improvement, but by the end it seems like it probably won’t get much better. (Though its loss curve does show some sudden downward jumps during the course of training, presumably as “discoveries” are made—and we can’t be sure there won’t be more of these.)

It’s extremely typical of machine learning that it manages to do a good job of getting things “roughly right”. But nailing the details is not what machine learning tends to be good at. So when what one’s trying to do depends on that, machine learning will be limited. And in the prediction task we’re considering here, the issue is that once things go even slightly off track, everything basically just gets worse from there on out.

Identifying Computational Reducibility

Computational reducibility is at the center of what we normally think of as “doing science”. Because it’s not only responsible for letting us make predictions, it’s also what lets us identify regularities, make models and compressed summaries of what we see—and develop understanding that we can capture in our minds.

But how can we find computational reducibility? Sometimes it’s very obvious. Like when we make a visualization of some behavior (like the cellular automaton evolution above) and immediately recognize simple features in it. But in practice computational reducibility may not be so obvious, and we may have to dig through lots of details to find it. And this is a place where AI can potentially help a lot.

At some level we can think of it as a story of “finding the right parametrization” or the “right coordinate system”. As a very straightforward example, consider the seemingly quite random cloud of points:

Just turning this particular cloud of points to the appropriate angle reveals obvious regularities:

But is there a general way to pick out regularities if they’re there? There’s traditional statistics (“Is there a correlation between A and B?”, etc.). There’s model fitting (“Is this a sum of Gaussians?”). There’s traditional data compression (“Is it shorter after run-length encoding?”). But all of these pick out only rather specific kinds of regularities. So can AI do more? Can it perhaps somehow provide a general way to find regularities?

To say one’s found a regularity in something is basically equivalent to saying one doesn’t need to specify all the details of the thing: that there’s a reduced representation from which one can reconstruct it. So, for example, given the “points-lie-on-lines” regularity in the picture above, one doesn’t need to separately specify the positions of all the points; one just needs to know that they form stripes with a certain separation.

OK, so let’s imagine we have an image with a certain number of pixels. We can ask whether there’s reduced representation that involves less data—from which the image can effectively be reconstructed. And with neural nets there’s what one might think of as a trick for finding such a reduced representation.

The basic idea is to set up a neural net as an autoencoder that takes inputs and reproduces them as outputs. One might think this would be a trivial task. But it’s not, because the data from the input has to flow through the innards of the neural net, effectively being “ground up” at the beginning and “reconstituted” at the end. But the point is that with enough examples of possible inputs, it’s potentially possible to train the neural net to successfully reproduce inputs, and operate as an autoencoder.

But now the idea is to look inside the autoencoder, and to pull out a reduced representation that it’s come up with. As data flows from layer to layer in the neural net, it’s always trying to preserve the information it needs to reproduce the original input. And if a layer has fewer elements, what’s present at that layer must correspond to some reduced representation of the original input.

Let’s start with a standard modern image autoencoder, that’s been trained on a few billion images typical of what’s on the web. Feed it a picture of a cat, and it’ll successfully reproduce something that looks like the original picture:

But in the middle there’ll be a reduced representation, with many fewer pixels—that somehow still captures what’s needed of the cat (here shown with its 4 color channels separated):

We can think of this as a kind of “black-box model” for the cat image. We don’t know what the elements (“features”) in the model mean, but somehow it’s successfully capturing “the essence of the picture”.

So what happens if we apply this to “scientific data”, or for example “artificial natural processes” like cellular automata? Here’s a case where we get successful compression:

In this case it’s not quite so successful:

And in these cases—where there’s underlying computational irreducibility—it has trouble:

But there’s a bit more to this story. You see, the autoencoder we’re using was trained on “everyday images”, not these kinds of “scientific images”. So in effect it’s trying to model our scientific images in terms of constructs like eyes and ears that are common in pictures of things like cats.

So what happens if—like in the case of cellular automaton prediction above—we train an autoencoder more specifically on the kinds of images we want?

Here are two very simple neural nets that we can use as an “encoder” and a “decoder” to make an autoencoder:

Now let’s take the standard MNIST image training set, and use these to train the autoencoder:

Each of these images has 28×28 pixels. But in the middle of the autoencoder we have a layer with just two elements. So this means that whatever we ask it to encode must be reduced to just two numbers:

And what we see here is that at least for images that look more or less like the ones it was trained on, the autoencoder manages to reconstruct something that looks at least roughly right, even from the radical compression. If you give it other kinds of images, however, it won’t be as successful, instead basically just insisting on reconstructing them as looking like images from its training set:

OK, so what about training it on cellular automaton images? Let’s take 10 million images generated with a particular rule:

Now we train our autoencoder on these images. Then we try feeding it similar images:

The results are at best very approximate; this small neural net didn’t manage to learn the “detailed ways of this particular cellular automaton”. If it had been successful at characterizing all the apparent complexity of the cellular automaton evolution with just two numbers, then we could have considered this an impressive piece of science. But, unsurprisingly, the neural net was effectively blocked by computational irreducibility.

But even though it can’t “seriously crack computational irreducibility” the neural net can still “make useful discoveries”, in effect by finding little pieces of computational reducibility, and little regularities. So, for example, if we take images of “noisy letters” and use a neural net to reduce them to pairs of numbers, and use these numbers to place the images, we get a “dimension-reduced feature space plot” that separates images of different letters:

But consider, for example, a collection of cellular automata with different rules:

Here’s how a typical neural net would arrange these images in “feature space”:

And, yes, this has almost managed to automatically discover the four classes of behavior that I identified in early 1983. But it’s not quite there. Though in a sense this is a difficult case, very much face-to-face with computational irreducibility. And there are plenty of cases (think: arrangement of the periodic table based on element properties; similarity of fluid flows based on Reynolds number; etc.) where one can expect a neural net to key into pockets of computational reducibility and at least successfully recapitulate existing scientific discoveries.

AI in the Non-human World

In its original concept AI was about developing artificial analogs of human intelligence. And indeed the recent great successes of AI—say in visual object recognition or language generation—are all about having artificial systems that reproduce the essence of what humans do. It’s not that there’s a precise theoretical definition of what makes an image be of a cat versus of a dog. What matters is that we can have a neural net that will come to the same conclusions as humans do.

So why does this work? Probably it’s because neural nets capture the architectural essence of actual brains. Of course the details of artificial neural networks aren’t the same as biological brains. But in a sense the big surprise of modern AI is that there seems to be enough universality to make artificial neural nets behave in ways that are functionally similar to human brains, at least when it comes to things like visual object recognition or language generation.

But what about questions in science? At one level we can ask whether neural nets can emulate what human scientists do. But there’s also another level: is it possible that neural nets can just directly work out how systems—say in nature—behave? Imagine we’re studying some physical process. Human scientists might find some human-level description of the system, say in terms of mathematical equations. But the system itself is just directly doing what it does. And the question is whether that’s something a neural net can capture.

And if neural nets “work” on “human-like tasks” only because they’re architecturally similar to brains, there’s no immediate reason to think that they should be able to capture “raw natural processes” that aren’t anything to do with brains. So what’s going on when AI does something like predicting protein folding?

One part of the story, I suspect, is that even though the physical process of protein folding has nothing to do with humans, the question of what aspects of it we consider significant does. We don’t expect that the neural net will predict the exact position of every atom (and in natural environments the atoms in a protein don’t even have precisely fixed positions). Instead, we want to know things like whether the protein has the “right general shape”, with the right “identifiable features” (like, say, alpha helices), or the right functional properties. And these are now more “human” questions—more in the “eye of the beholder”—and more like a question such as whether we humans judge an image to be of a cat versus a dog. So if we conclude that a neural net “solves the scientific problem” of how a protein folds, it might be at least in part just because the criteria of success that our brains (“subjectively”) apply is something that a neural net—with its brain-like architecture—happens to be able to deliver.

It’s a bit like producing an image with generative AI. At the level of basic human visual perception, it may look like something we recognize. But if we scrutinize it, we can see that it’s not “objectively” what we think it is:

It wasn’t ever really practical with “first-principles physics” to figure out how proteins fold. So the fact that neural nets can get even roughly correct answers is impressive. So how do they do it? A significant part of it is surely effectively just matching chunks of protein to what’s in the training set—and then finding “plausible” ways to “stitch” these chunks together. But there’s probably something else too. One’s familiar with certain “pieces of regularity” in proteins (things like alpha helices and beta sheets). But it seems likely that neural nets are effectively plugging into other kinds of regularity; they’ve somehow found pockets of reducibility that we didn’t know were there. And particularly if just a few pockets of reducibility show up over and over again, they’ll effectively represent new, general “results in science” (say, some new kind of commonly occurring “meta-motif” in protein structure).

But while it’s fundamentally inevitable that there must be an infinite number of pockets of computational reducibility in the end, it’s not clear at the outset either how significant these might be in things we care about, or how successful neural net methods might be in finding them. We might imagine that insofar as neural nets mirror the essential operation of our brains, they’d only be able to find pockets of reducibility in cases where we humans could also readily discover them, say by looking at some visualization or another.

But an important point is that our brains are normally “trained” only on data that we readily experience with our senses: we’ve seen the equivalent of billions of images, and we’ve heard zillions of sounds. But we don’t have direct experience of the microscopic motions of molecules, or of a multitude of kinds of data that scientific observations and measuring devices can deliver.

A neural net, however, can “grow up” with very different “sensory experiences”—say directly experiencing “chemical space”, or, for that matter “metamathematical space”, or the space of financial transactions, or interactions between biological organisms, or whatever. But what kinds of pockets of computational reducibility exist in such cases? Mostly we don’t know. We know the ones that correspond to “known science”. But even though we can expect others must exist, we don’t normally know what they are.

Will they be “accessible” to neural nets? Again, we don’t know. Quite likely, if they are accessible, then there’ll be some representation—or, say, visualization—in which the reducibility will be “obvious” to us. But there are plenty of ways this could fail. For example, the reducibility could be “visually obvious”, but only, say, in 3D volumes where, for example, it’s hard even to distinguish different structures of fluffy clouds. Or perhaps the reducibility could be revealed only through some computation that’s not readily handled by a neural net.

Inevitably there are many systems that show computational irreducibility, and which—at least in their full form—must be inaccessible to any “shortcut method”, based on neural nets or otherwise. But what we’re asking is whether, when there is a pocket of computational reducibility, it can be captured by a neural net.

But once again we’re confronted with the fact there are no “model-less models”. Some particular kind of neural net will readily be able to capture some particular kinds of computational reducibility; another will readily be able to capture others. And, yes, you can always construct a neural net that will approximate any given specific function. But in capturing some general kind of computational reducibility, we are asking for much more—and what we can get will inevitably depend on the underlying structure of the neural net.

But let’s say we’ve got a neural net to successfully key into computational reducibility in a particular system. Does that mean it can predict everything? Typically no. Because almost always the computational reducibility is “just a pocket”, and there’s plenty of computational irreducibility—and “surprises”—“outside”.

And indeed this seems to happen even in the case of something like protein folding. Here are some examples of proteins with what we perceive as fairly simple structures—and the neural net prediction (in yellow) agrees quite well with the results of physical experiments (gray tubes):

But for proteins with what we perceive as more complicated structures, the agreement is often not nearly as good:

These proteins are all are at least similar to ones that were used to train the neural net. But how about very different proteins—say ones with random sequences of amino acids?

It’s hard to know how well the neural net does here; it seems likely that particularly if there are “surprises” it won’t successfully capture them. (Of course, it could be that all “reasonable proteins” that normally appear in biology could have certain features, and it could be “unfair” to apply the neural net to “unbiological” random ones—though for example in the adaptive immune system, biology does effectively generate at least short “random proteins”.)

Solving Equations with AI

In traditional mathematical science the typical setup is: here are some equations for a system; solve them to find out how the system behaves. And before computers, that usually meant that one had to find some “closed-form” formula for the solution. But with computers, there’s an alternative approach: make a discrete “numerical approximation”, and somehow incrementally solve the equations. To get accurate results, though, may require many steps and lots of computational effort. So then the question is: can AI speed this up? And in particular, can AI, for example, go directly from initial conditions for an equation to a whole solution?

Let’s consider as an example a classical piece of mathematical physics: the three-body problem. Given initial positions and velocities of three point masses interacting via inverse-square-law gravity, what trajectories will the masses follow? There’s a lot of diversity—and often a lot of complexity—which is why the three-body problem has been such a challenge:

But what if we train a neural net on lots of sample solutions? Can it then figure out the solution in any particular case? We’ll use a rather straightforward “multilayer perceptron” network:

We feed it initial conditions, then ask it to generate a solution. Here are a few examples of what it does, with the correct solutions indicated by the lighter background paths:

When the trajectories are fairly simple, the neural net does decently well. But when things get more complicated, it does decreasingly well. It’s as if the neural net has “successfully memorized” the simple cases, but doesn’t know what to do in more complicated cases. And in the end this is very similar to what we saw above in examples like predicting cellular automaton evolution (and presumably also protein folding).

And, yes, once again this is a story of computational irreducibility. To ask to just “get the solution” in one go is to effectively ask for complete computational reducibility. And insofar as one might imagine that—if only one knew how to do it—one could in principle always get a “closed-form formula” for the solution, one’s implicitly assuming computational reducibility. But for many decades I’ve thought that something like the three-body problem is actually quite full of computational irreducibility.

Of course, had a neural net been able to “crack the problem” and immediately generate solutions, that would effectively have demonstrated computational reducibility. But as it is, the apparent failure of neural nets provides another piece of evidence for computational irreducibility in the three-body problem. (It’s worth mentioning, by the way, that while the three-body problem does show sensitive dependence on initial conditions, that’s not the primary issue here; rather, it’s the actual intrinsic complexity of the trajectories.)

We already know that discrete computational systems like cellular automata are rife with computational irreducibility. And we might have imagined that continuous systems—described for example by differential equations—would have more structure that would somehow make them avoid computational irreducibility. And indeed insofar as neural nets (in their usual formulation) involve continuous numbers, we might have thought that they would be able in some way to key into the structure of continuous systems to be able to predict them. But somehow it seems as if the “force of computational irreducibility” is too strong, and will ultimately be beyond the power of neural networks.

Having said that, though, there can still be a lot of practical value to neural networks in doing things like solving equations. Traditional numerical approximation methods tend to work locally and incrementally (if often adaptively). But neural nets can more readily handle “much larger windows”, in a sense “knowing longer runs of behavior” and being able to “jump ahead” across them. In addition, when one’s dealing with very large numbers of equations (say in robotics or systems engineering), neural nets can typically just “take in all the equations and do something reasonable” whereas traditional methods effectively have to work with the equations one by one.

The three-body problem involves ordinary differential equations. But many practical problems are instead based on partial differential equations (PDEs), in which not just individual coordinates, but whole functions f[x] etc., evolve with time. And, yes, one can use neural nets here as well, often to significant practical advantage. But what about computational irreducibility? Many of the equations and situations most studied in practice (say for engineering purposes) tend to avoid it, but certainly in general it’s there (notably, say, in phenomena like fluid turbulence). And when there’s computational irreducibility, one can’t ultimately expect neural nets to do well. But when it comes to satisfying our human purposes—as in other examples we’ve discussed—things may look better.

As an example, consider predicting the weather. In the end, this is all about PDEs for fluid dynamics (and, yes, there are also other effects to do with clouds, etc.). And as one approach, one can imagine directly and computationally solving these PDEs. But another approach would be to have a neural net just “learn typical patterns of weather” (as old-time meteorologists had to), and then have the network (a bit like for protein folding) try to patch together these patterns to fit whatever situation arises.

How successful will this be? It’ll probably depend on what we’re looking at. It could be that some particular aspect of the weather shows considerable computational reducibility and is quite predictable, say by neural nets. And if this is the aspect of the weather that we care about, we might conclude that the neural net is doing well. But if something we care about (“will it rain tomorrow?”) doesn’t tap into a pocket of computational reducibility, then neural nets typically won’t be successful in predicting it—and instead there’d be no choice but to do explicit computation, and perhaps impractically much of it.

AI for Multicomputation

In what we’ve discussed so far, we’ve mostly been concerned with seeing whether AI can help us “jump ahead” and shortcut some computational process or another. But there are also lots of situations where what’s of interest is instead to shortcut what one can call a multicomputational process, in which there are many possible outcomes at each step, and the goal is for example to find a path to some final outcome.

As a simple example of a multicomputational process, let’s consider a multiway system operating on strings, where at each step we apply the rules {A BBB, BB A} in all possible ways:

Given this setup we can ask a question like: what’s the shortest path from A to BABA? And in the case shown here it’s easy to compute the answer, say by explicitly running a pathfinding algorithm on the graph:

There are many kinds of problems that follow this same general pattern. Finding a winning sequence of plays in a game graph. Finding the solution to a puzzle as a sequence of moves through a graph of possibilities. Finding a proof of a theorem given certain axioms. Finding a chemical synthesis pathway given certain basic reactions. And in general solving a multitude of NP problems in which many “nondeterministic” paths of computation are possible.

In the very simple example above, we’re readily able to explicitly generate a whole multiway graph. But in most practical examples, the graph would be astronomically too large. So the challenge is typically to suss out what moves to make without tracing the whole graph of possibilities. One common approach is to try to find a way to assign a score to different possible states or outcomes, and to pursue only paths with (say) the highest scores. In automated theorem proving it’s also common to work “downward from initial propositions” and “upward from final theorems”, trying to see where the paths meet in the middle. And there’s also another important idea: if one has established the “lemma” that there’s a path from X to Y, one can add X Y as a new rule in the collection of rules.

So how might AI help? As a first approach, we could consider taking something like our string multiway system above, and training what amounts to a language-model AI to generate sequences of tokens that represent paths (or what in a mathematical setting would be proofs). The idea is to feed the AI a collection of valid sequences, and then to present it with the beginning and end of a new sequence, and ask it to fill in the middle.

We’ll use a fairly basic transformer network:

Then we train it by giving lots of sequences of tokens corresponding to valid paths (with E being the “end token”)

together with “negative examples” indicating the absence of paths:

Now we “prompt” the trained network with a “prefix” of the kind that appeared in the training data, and then iteratively run “LLM style” (effectively at zero temperature, i.e. always choosing the “most probable” next token):

For a while, it does perfectly—but near the end it starts making errors, as indicated by the tokens shown in red. There’s different performance with different destinations—with some cases going off track right at the beginning:

How can we do better? One possibility is at each step to keep not just the token that’s considered most probable, but a stack of tokens—thereby in effect generating a multiway system that the “LLM controller” could potentially navigate. (One can think of this somewhat whimsically as a “quantum LLM”, that’s always exploring multiple paths of history.)

(By the way, we could also imagine training with many different rules, then doing what amounts to zero-shot learning and giving a “pre-prompt” that specifies what rule we want to use in any particular case.)

One of the issues with this LLM approach is that the sequences it generates are often even “locally wrong”: the next element can’t follow from the one before according to the rules given.

But this suggests another approach one can take. Instead of having the AI try to “immediately fill in the whole sequence”, get it instead just to pick “where to go next”, always following one of the specified rules. Then a simple goal for training is in effect to get the AI to learn the distance function for the graph, or in other words, to be able to estimate how long the shortest path is (if it exists) from any one node to any other. Given such a function, a typical strategy is to follow what amounts to a path of “steepest descent”—at each step picking the move that the AI estimates will do best in reducing the distance to the destination.

How can this actually be implemented with neural networks? One approach is to use two encoders (say constructed out of transformers)—that in effect generate two embeddings, one for source nodes, and one for destination nodes. The network then combines these embeddings and learns a “metric” that characterizes the distance between the nodes:

Training such a network on the multiway system we’ve been discussing—by giving it a few million examples of source-destination distances (plus an indicator of whether this distance is infinite)—we can use the network to predict a piece of the distance matrix for the multiway system. And what we find is that this predicted matrix is similar—but definitely not identical—to the actual matrix:

Still, we can imagine trying to build a path where at each step we compute the estimated distances-to-destination predicted by the neural net for each possible destination, then pick the one that “gets furthest”:

Each individual move here is guaranteed to be valid, and we do indeed eventually reach our destination BABA—though in slightly more steps than the true shortest path. But even though we don’t quite find the optimal path, the neural net has managed to allow us to at least somewhat prune our “search space”, by prioritizing nodes and traversing only the red edges:

(A technical point is that the particular neural net we’ve used here has the property that all paths between any given pair of nodes always have the same length—so if any path is found, it can be considered “the shortest”. A rule like {A AAB, BBA B} doesn’t have this property and a neural net trained for this rule can end up finding paths that reach the correct destination but aren’t as short as they could be.)

Still, as is typical with neural nets, we can’t be sure how well this will work. The neural net might make us go arbitrarily far “off track”, and it might even lead us to a node where we have no path to our destination—so that if we want to make progress we’ll have to resort to something like traditional algorithmic backtracking.

But at least in simple cases the approach can potentially work well—and the AI can successfully find a path that wins the game, proves the theorem, etc. But one can’t expect it to always work. And the reason is that it’s going to run into multicomputational irreducibility. Just as in a single “thread of computation” computational irreducibility can mean that there’s no shortcut to just “going through the steps of the computation”, so in a multiway system multicomputational irreducibility can mean that there’s no shortcut to just “following all the threads of computation”, then seeing, for example, which end up merging with which.

But even though this could happen in principle, does it in fact happen in practice in cases of interest to us humans? In something like games or puzzles, we tend to want it to be hard—but not too hard—to “win”. And when it comes to mathematics and proving theorems, cases that we use for exercises or competitions we similarly want to be hard, but not too hard. But when it comes to mathematical research, and the frontiers of mathematics, one doesn’t immediately expect any such constraint. And the result is then that one can expect to be face-to-face with multicomputational irreducibility—making it hard for AI to help too much.

There is, however, one footnote to this story, and it has to do with how we choose new directions in mathematics. We can think of a metamathematical space formed by building up theorems from other theorems in all possible ways in a giant multiway graph. But as we’ll discuss below, most of the details of this are far from what human mathematicians would think of as “doing mathematics”. Instead, mathematicians implicitly seem to do mathematics at a “higher level” in which they’ve “coarse grained” this “microscopic metamathematics”—much as we might study a physical fluid in terms of comparatively-simple-to-describe continuous dynamics even though “underneath” there are lots of complicated molecular motions.

So can AI help with mathematics at this “fluid-dynamics-style” level? Potentially so, but mainly in what amounts to providing code assistance. We have something we want to express, say, in Wolfram Language. But we need help—“LLM style”—in going from our informal conception to explicit computational language. And insofar as what we’re doing follows the structural patterns of what’s been done before, we can expect something like an LLM to help. But insofar as what we’re expressing is “truly new”, and inasmuch as our computational language doesn’t involve much “boilerplate”, it’s hard to imagine that an AI trained on what’s been done before will help much. Instead, what we in effect have to do is some multicomputationally irreducible computation, that allows us to explore to some fresh part of the computational universe and the ruliad.

Exploring Spaces of Systems

“Can one find a system that does X?” Say a Turing machine that runs for a very long time before halting. Or a cellular automaton that grows, but only very slowly. Or, for that matter, a chemical with some particular property.

This is a somewhat different type of question than the ones we’ve been discussing so far. It’s not about taking a particular rule and seeing what its consequences are. It’s about identifying what rule might exist that has certain consequences.

And given some space of possible rules, one approach is exhaustive search. And in a sense this is ultimately the only “truly unbiased” approach, that will discover what’s out there to discover, even when one doesn’t expect it. Of course, even with exhaustive search, one still needs a way to determine whether a particular candidate system meets whatever criterion one has set up. But now this is the problem of predicting a computation—where the things we said above apply.

OK, but can we do better than exhaustive search? And can we, for example, find a way to figure out what rules to explore without having to look at every rule? One approach is to do something like what happens in biological evolution by natural selection: start, say, from a particular rule, and then incrementally change it (perhaps at random), at every step keeping the rule or rules that do best, and discarding the others.

This isn’t “AI” as we’ve operationally defined it here (it’s more like a “genetic algorithm”)—though it is a bit like the inner training loop of a neural net. But will it work? Well, that depends on the structure of the rule space—and, as one sees in machine learning—it tends to work better in higher-dimensional rule spaces than lower-dimensional ones. Because with more dimensions there’s less chance one will get “stuck in a local minimum”, unable to find one’s way out to a “better rule”.

And in general, if the rule space is like a complicated fractal mountainscape, it’s reasonable to expect one can make progress incrementally (and perhaps AI methods like reinforcement learning can help refine what incremental steps to take). But if instead it’s quite flat, with, say, just one “hole” somewhere (“golf-course style”), one can’t expect to “find the hole” incrementally. So what is the typical structure of rule spaces? There are certainly plenty of cases where the rule space is altogether quite large, but the number of dimensions is only modest. And in such cases (an example being finding small Turing machines with long halting times) there often seem to be “isolated solutions” that can’t be reached incrementally. But when there are more dimensions, it seems likely that what amounts to computational irreducibility will more or less guarantee that there’ll be a “random-enough landscape” that incremental methods will be able to do well, much as we have seen in machine learning in recent years.

So what about AI? Might there be a way for AI to learn how to “pick winners directly in rule space”, without any kind of incremental process? Might we perhaps be able to find some “embedding space” in which the rules we want are laid out in a simple way—and thus effectively “pre-identified” for us? Ultimately it depends on what the rule space is like, and whether the process of exploring it is necessarily (multi)computationally irreducible, or whether at least the aspects of it that we care about can be explored by a computationally reducible process. (By the way, trying to use AI to directly find systems with particular properties is a bit like trying to use AI to directly generate neural nets from data without incremental training.)

Let’s look at a specific simple example based on cellular automata. Say we want to find a cellular automaton rule that—when evolved from a single-cell initial condition—will grow for a while, but then die out after a particular, exact number of steps. We can try to solve this with a very minimal AI-like “evolutionary” approach: start from a random rule, then at each “generation” produce a certain number of “offspring” rules, each with one element randomly changed—then keep whichever is the “best” of these rules. If we want to find a rule that “lives” for exactly 50 steps, we define “best” to be the one that minimizes a “loss function” equal to the distance from 50 of the number of steps a rule actually “lives”.

So, for example, say we start from the randomly chosen (3-color) rule:

Our evolutionary sequence of rules (showing here only the “outcome values”) might be:

If we look at the behavior of these rules, we see that—after an inauspicious start—they manage to successfully evolve to reach a rule that meets the criterion of “living for exactly 50 steps”:

What we’ve shown here is a particular randomly chosen “path of evolution”. But what happens with other paths? Here’s how the “loss” evolves (over the course of 100 generations) for a collection of paths:

And what we see is that there’s only one “winner” here that achieves zero loss; on all the other paths, evolution “gets stuck”.

As we mentioned above, though, with more “dimensions” one’s less likely to get stuck. So, for example, if we look at 4-color cellular automaton rules, there are now 64 rather than 27 possible elements (or effectively dimensions) to change, and in this case, many paths of evolution “get further”

and there are more “winners” such as:

How could something like neural nets help us here? Insofar as we can use them to predict cellular automaton evolution, they might give us a way to speed up what amounts to the computation of the loss for each candidate rule—though from what we saw in an earlier section, computational irreducibility is likely to limit this. Another possibility is that—much as in the previous section—we could try to use neural nets to guide us in which random changes to make at each generation. But while computational irreducibility probably helps in making things “effectively random enough” that we won’t get stuck, it makes it difficult to have something like a neural net successfully tell us “which way to go”.

Science as Narrative

In many ways one can view the essence of science—at least as it’s traditionally been practiced—as being about taking what’s out there in the world and somehow casting it in a form we humans can think about. In effect, we want science to provide a human-accessible narrative for what happens, say in the natural world.

The phenomenon of computational irreducibility now shows us that this will often ultimately not be possible. But whenever there’s a pocket of computational reducibility it means that there’s some kind of reduced description of at least some part of what’s going on. But is that reduced description something that a human could reasonably be expected to understand? Can it, for example, be stated succinctly in words, formulas, or computational language? If it can, then we can think of it as representing a successful “human-level scientific explanation”.

So can AI help us automatically create such explanations? To do so it must in a sense have a model for what we humans understand—and how we express this understanding in words, etc. It doesn’t do much good to say “here are 100 computational steps that produce this result”. To get a “human-level explanation” we need to break this down into pieces that humans can assimilate.

As an example, consider a mathematical proof, generated by automated theorem proving:

A computer can readily check that this is correct, in that each step follows from what comes before. But what we have here is a very “non-human thing”—about which there’s no realistic “human narrative”. So what would it take to make such a narrative? Essentially we’d need “waypoints” that are somehow familiar—perhaps famous theorems that we readily recognize. Of course there may be no such things. Because what we may have is a proof that goes through “uncharted metamathematical territory”. So—AI assisted or not—human mathematics as it exists today may just not have the raw material to let us create a human-level narrative.

In practice, when there’s a fairly “short metamathematical distance” between steps in a proof, it’s realistic to think that a human-level explanation can be given. And what’s needed is very much like what Wolfram|Alpha does when it produces step-by-step explanations of its answers. Can AI help? Potentially, using methods like our second approach to AI-assisted multicomputation above.

And, by the way, our efforts with Wolfram Language help too. Because the whole idea of our computational language is to capture “common lumps of computational work” as built-in constructs—and in a sense the process of designing the language is precisely about identifying “human-assimilable waypoints” for computations. Computational irreducibility tells us that we’ll never be able to find such waypoints for all computations. But our goal is to find waypoints that capture current paradigms and current practice, as well as to define directions and frameworks for extending these—though ultimately “what we humans know about” is something that’s determined by the state of human knowledge as it’s historically evolved.

Proofs and computational language programs are two examples of structured “scientific narratives”. A potentially simpler example—aligned with the mathematical tradition for science—is a pure formula. “It’s a power law”. “It’s a sum of exponentials”. Etc. Can AI help with this? A function like FindFormula is already using machine-learning-inspired techniques to take data and try to produce a “reasonable formula for it”.

Here’s what it does for the first 100 primes:

Going to 10,000 primes it produces a more complicated result:

Or, let’s say we ask about the relation between GDP and population for countries. Then we can get formulas like:

But what (if anything) do these formulas mean? It’s a bit like with proof steps and so on. Unless we can connect what’s in the formulas with things we know about (whether in number theory or economics) it’ll usually be difficult to conclude much from them. Except perhaps in some rare cases where one can say “yes, that’s a new, useful law”—like in this “derivation” of Kepler’s third law (where 0.7 is a pretty good approximation to 2/3):

There’s an even more minimal example of this kind of thing in recognizing numbers. Type a number into Wolfram|Alpha and it’ll try to tell you what “possible closed forms” for the number might be:

There are all sorts of tradeoffs here, some very AI informed. What’s the relative importance of getting more digits right compared to having a simple formula? What about having simple numbers in the formula compared to having “more obscure” mathematical constants (e.g. π versus Champernowne’s number)? When we set up this system for Wolfram|Alpha 15 years ago, we used the negative log frequency of constants in the mathematical literature as a proxy for their “information content”. With modern LLM techniques it may be possible to do a more holistic job of finding what amounts to a “good scientific narrative” for a number.

But let’s return to things like predicting the outcome of processes such as cellular automaton evolution. In an earlier section we discussed getting neural nets to do this prediction. We viewed this essentially as a “black-box” approach: we wanted to see if we could get a neural net to successfully make predictions, but we weren’t asking to get a “human-level understanding” of those predictions.

It’s a ubiquitous story in machine learning. One trains a neural net to successfully predict, classify, or whatever. But if one “looks inside” it’s very hard to tell what’s going on. Here’s the final result of applying an image identification neural network:

And here are the “intermediate thoughts” generated after going through about half the layers in the network:

Maybe something here is a “definitive signature of catness”. But it’s not part of our current scientific lexicon—so we can’t usefully use it to develop a “scientific narrative” that explains how the image should be interpreted.

But what if we could reduce our images to just a few parameters—say using an autoencoder of the kind we discussed above? Conceivably we could set things up so that we’d end up with “interpretable parameters”—or, in other words, parameters where we can give a narrative explanation of what they mean. For example, we could imagine using something like an LLM to pick parameters that somehow align with words or phrases (“pointiness”, “fractal dimension”, etc.) that appear in explanatory text from around the web. And, yes, these words or phrases could be based on analogies (“cactus-shaped”, “cirrus-cloud-like”, etc.)—and something like an LLM could “creatively” come up with these names.

But in the end there’s nothing to say that a pocket of computational reducibility picked out by a certain autoencoder will have any way to be aligned with concepts (scientific or otherwise) that we humans have yet explored, or so far given words to. Indeed, in the ruliad at large, it is overwhelmingly likely that we’ll find ourselves in “interconcept space”—unable to create what we would consider a useful scientific narrative.

This depends a bit, however, on just how we constrain what we’re looking at. We might implicitly define science to be the study of phenomena for which we have—at some time—successfully developed a scientific narrative. And in this case it’s of course inevitable that such a narrative will exist. But even given a fixed method of observation or measurement it’s basically inevitable that as we explore, computational irreducibility will lead to “surprises” that break out of whatever scientific narrative we were using. Or in other words, if we’re really going to discover new science, then—AI or not—we can’t expect to have a scientific narrative based on preexisting concepts. And perhaps the best we can hope for is that we’ll be able to find pockets of reducibility, and that AI will “understand” enough about us and our intellectual history that it’ll be able to suggest a manageable path of new concepts that we should learn to develop a successful scientific narrative for what we discover.

Finding What’s Interesting

A central part of doing open-ended science is figuring out “what’s interesting”. Let’s say one just enumerates a collection of cellular automata:

The ones that just die out—or make uniform patterns—“don’t seem interesting”. The first time one sees a nested pattern generated by a cellular automaton, it might seem interesting (as it did to me in 1981). But pretty soon it comes to seem routine. And at least as a matter of basic ruliology, what one ends up looking for is “surprise”: qualitatively new behavior one hasn’t seen before. (If one’s concerned with specific applications, say to modeling particular systems in the world, then one might instead want to look at rules with certain structure, whether or not their behavior “abstractly seems interesting”.)

The fact that one can expect “surprises” (and indeed, be able to do useful, truly open-ended science at all) is a consequence of computational irreducibility. And whenever there’s a “lack of surprise” it’s basically a sign of computational reducibility. And this makes it plausible that AI—and neural nets—could learn to identify at least certain kinds of “anomalies” or “surprises”, and thereby discover some version of “what’s interesting”.

Usually the basic idea is to have a neural net learn the “typical distribution” of data—and then to identify outliers relative to this. So for example we might look at a large number of cellular automaton patterns to learn their “typical distribution”, then plot a projection of this onto a 2D feature space, indicating where certain specific patterns lie:

Some of the patterns show up in parts of the distribution where their probabilities are high, but others show up where the probabilities are low—and these are the outliers:

Are these outliers “interesting”? Well, it depends on your definition of “interesting”. And in the end that’s “in the eye of the beholder”. Here, the “beholder” is a neural net. And, yes, these particular patterns wouldn’t be what I would have picked. But relative to the “typical patterns” they do seem at least “somewhat different”. And presumably it’s basically a story like the one with neural nets that distinguish pictures of cats and dogs: neural nets make at least somewhat similar judgements to the ones we do—perhaps because our brains are structurally like neural nets.

OK, but what does a neural net “intrinsically find interesting”? If the neural net is trained then it’ll very much be influenced by what we can think of as the “cultural background” it gets from this training. But what if we just set up neural nets with a given architecture, and pick their weights at random? Let’s say they’re neural nets that compute functions . Then here are examples of collections of functions they compute:

Not too surprisingly, the functions that come out very much reflect the underlying activation functions that appear at the nodes of our neural nets. But we can see that—a bit like in a random walk process—“more extreme” functions are less likely to be produced by neural nets with random weights, so can be thought of as “intrinsically more surprising” for neural nets.

But, OK, “surprise” is one potential criterion for “interestingness”. But there are others. And to get a sense of this we can look at various kinds of constructs that can be enumerated, and where we can ask which possible ones we consider “interesting enough” that we’ve, for example, studied them, given them specific names, or recorded them in registries.

As a first example, let’s consider a family of hydrocarbon molecules: alkanes. Any such molecule can be represented by a tree graph with nodes corresponding to carbon atoms, and having valence at most 4. There are a total of 75 alkanes with 10 or fewer carbons, and all of them typically appear in standard lists of chemicals (and in our Wolfram Knowledgebase). But with 10 carbons only some alkanes are “interesting enough” that they’re listed, for example in our knowledgebase (aggregating different registries one finds more alkanes listed, but by 11 carbons at least 42 out of 159 always seem to be “missing”—and are not highlighted here):

What makes some of these alkanes be considered “more interesting” in this sense than others? Operationally it’s a question of whether they’ve been studied, say in the academic literature. But what determines this? Partly it’s a matter of whether they “occur in nature”. Sometimes—say in petroleum or coal—alkanes form through what amount to “random reactions”, where unbranched molecules tend to be favored. But alkanes can also be produced in biological systems, through careful orchestration, say by enzymes. But wherever they come from, it’s as if the alkanes that are more familiar are the ones that seem “more interesting”. So what about “surprise”? Whether a “surprise alkane”—say made by explicit synthesis in a lab—is considered “interesting” probably depends first and foremost on whether it’s identified to have “interesting properties”. And that in turn tends to be a question of how its properties fit into the whole web of human knowledge and technology.

So can AI help in determining which alkanes we’re likely to consider interesting? Traditional computational chemistry—perhaps sped up by AI—can potentially determine the rates at which different alkanes are “randomly produced”. And in a quite different direction, analyzing the academic literature—say with an LLM—can potentially predict how much a certain alkane can be expected to be studied or talked about. Or (and this is particularly relevant for drug candidates) whether there are existing hints of “if only we could find a molecule that does ___” that one can pick up from things like academic literature.

As another example, let’s consider mathematical theorems. Much like with chemicals, one can in principle enumerate possible mathematical theorems by starting from axioms and then seeing what theorems can progressively be derived from them. Here’s what happens in just two steps starting from some typical axioms for logic:

There are a vast number of “uninteresting” (and often seemingly very pedantic) theorems here. But among all these there are two that are interesting enough that they’re typically given names (“the idempotence laws”) in textbooks of logic. Is there any way to determine whether a theorem will be given a name? One might have thought that would be a purely historical question. But at least in the case of logic there seems to be a systematic pattern. Let’s say one enumerates theorems of logic starting with the simplest, and going on in a lexicographic order. Most theorems in the list will be derivable from earlier ones. But a few will not. And these turn out to be basically exactly the ones that are typically given names (and highlighted here):

Or, in other words, at least in the rather constrained case of basic logic, the theorems considered interesting enough to be given names are the ones that “surprise us with new information”.

If we look more generally in “metamathematical space” we can get some empirical idea of where theorems that have been “considered interesting” lie:

Could an AI predict this? We could certainly create a neural net trained from the existing literature of mathematics, and its few million stated theorems. And we could then start feeding this neural net theorems found by systematic enumeration, and asking it to determine how plausible they are as things that might appear in mathematical literature. And in our systematic enumeration we could even ask the neural net to determine what “directions” are likely to be “interesting”—like in our second method for “AI-assisted traversal of multiway systems” above.

But when it comes to finding “genuinely new science” (or math) there’s a problem with this—because a neural net trained from existing literature is basically going to be looking for “more of the same”. Much like the typical operation of peer review, what it’ll “accept” is what’s “mainstream” and “not too surprising”. So what about the surprises that computational irreducibility inevitably implies will be there? By definition, they won’t be “easily reducible” to what’s been seen before.

Yes, they can provide new facts. And they may even have important applications. But there often won’t be—at least at first—a “human-accessible narrative” that “reaches” them. And what it’ll take to create that is for us humans to internalize some new concept that eventually becomes familiar. (And, yes, as we discussed above, if some particular new concept—or, say, new theorem—seems to be a “nexus” for reaching things, that becomes a target for a concept that’s worth us “adding”.)

But in the end, there’s a certain arbitrariness in which “new facts” or “new directions” we want to internalize. Yes, if we go in a particular direction it may lead us to certain ideas or technology or activities. But abstractly we don’t know which direction we might go is “right”; at least in the first instance, that seems like a quintessential matter of human choice. There’s a potential wrinkle, though. What if our AIs know enough about human psychology and society that they can predict “what we’d like”? At first it might seem that they could then successfully “pick directions”. But once again computational irreducibility blocks us—because ultimately we can’t “know what we’ll like” until we “get there”.

We can relate all this to generative AI, for example for images or text. At the outset, we might imagine enumerating images that consist of arbitrary arrays of pixels. But an absolutely overwhelming fraction of these won’t be at all “interesting” to us; they’ll just look to us like “random noise”:

By training a neural net on billions of human-selected images, we can get it to produce images that are somehow “generally like what we find interesting”. Sometimes the images produced will be recognizable to the point where we’ll be able to give a “narrative explanation” of “what they look like”:

But very often we’ll find ourselves with images “out in interconcept space”:

Are these “interesting”? It’s hard to say. Scanning the brain of a person looking at them, we might notice some particular signal—and perhaps an AI could learn to predict that. But inevitably that signal would change if some type of “interconcept image” become popular, and started, say, to be recognized as a kind of art that people are familiar with.

And in the end we’re back to the same point: things are ultimately “interesting” if our choices as a civilization make them so. There’s no abstract notion of “interestingness” that an AI or anything can “go out and discover” ahead of our choices.

And so it is with science. There’s no abstract way to know “what’s interesting” out of all the possibilities in the ruliad; that’s ultimately determined by the choices we make in “colonizing” the ruliad.

But what if—instead of going out into the “wilds of the ruliad”—we stay close to what’s already been done in science, and what’s already “deemed interesting”? Can AI help us extend what’s there? As a practical matter—at least when supplemented with our computational language as a tool—the answer is at some level surely yes. And for example LLMs should be able to produce things that follow the pattern of academic papers—with dashes of “originality” coming from whatever randomness is used in the LLM.

How far can such an approach get? The existing academic literature is certainly full of holes. Phenomenon A was investigated in system X, and B in Y, but not vice versa, etc. And we can expect that AIs—and LLMs in particular—can be useful in identifying these holes, and in effect “planning” what science is (by this criterion) interesting to do. And beyond this, we can expect that things like LLMs will be helpful in mapping out “usual and customary” paths by which the science should be done. (“When you’re analyzing data like this, one typically quotes such-and-such a metric”; “when you’re doing an experiment like this, you typically prepare a sample like this”; etc.) When it comes to actually “doing the science”, though, our actual computational language tools—together with things like computationally controlled experimental equipment—will presumably be what’s usually more central.

But let’s say we’ve defined some major objective for science (“figure out how to reverse aging”, or, a bit more modestly, “solve cryonics”). In giving such an objective, we’re specifying something we consider “interesting”. And then the problem of getting to that objective is—at least conceptually—like finding a proof of theorem or a synthesis pathway for a chemical. There are certain “moves we can make”, and we need to find out how to “string these together” to get to the objective we want. Inevitably, though, there’s an issue with (multi)computational irreducibility: there may be an irreducible number of steps we need to take to get to the result. And even though we may consider the final objective “interesting”, there’s no guarantee that we’ll find the intermediate steps even slightly interesting. Indeed, in many proofs—as well as in many engineering systems—one may need to build on an immense number of excruciating details to get to the final “interesting result”.

But let’s talk more about the question of what to study—or, in effect, what’s “interesting to study”. “Normal science” tends to be concerned with making incremental progress, remaining within existing paradigms, but gradually filling in and extending what’s there. Usually the most fertile areas are on the interfaces between existing well-developed areas. At the outset, it’s not at all obvious that different areas of science should ultimately fit together at all. But given the concept of the ruliad as the ultimate underlying structure, this begins to seem less surprising. Still, to actually see how different areas of science can be “knitted together” one will often have to identify—perhaps initially quite surprising—analogies between very different descriptive frameworks. “A decidable theory in metamathematics is like a black hole in physics”; “concepts in language are like particles in rulial space”; etc.

And this is an area where one can expect LLMs to be helpful. Having seen the “linguistic pattern” of one area, one can expect them to be able to see its correspondence in another area—potentially with important consequences.

But what about fresh new directions in science? Historically, these have often been the result of applying some new practical methodology (say for doing a new kind of experiment or measurement)—that happens to open up some “new place to look”, where people have never looked before. But usually one of the big challenges is to recognize that something one sees is actually “interesting”. And to do this often in effect involves the creation of some new conceptual framework or paradigm.

So can AI—as we’ve been discussing it here—be expected to do this? It doesn’t seem likely. AI is typically something trained on existing human material, intended to extrapolate directly from that. It’s not something built to “go out into the wilds of the ruliad”, far from anything already connected to humans.

But in a sense that is the domain of “arbitrary computation”, and of things like the simple programs we might enumerate or pick at random in ruliology. And, yes, by going out into the “wilds of the ruliad” it’s easy enough to find fresh, new things not currently assimilated into science. The challenge, though, is to connect them to anything we humans currently “understand” or “find interesting”. And that, as we’ve said before, is something that quintessentially involves human choice, and the foibles of human history. There are an infinite collection of paths that could be taken. (And indeed, in a “society of AIs”, there could be AIs that pursue a certain collection of them.) But in the end what matters to us humans and the enterprise we normally call “science” is our internal experience. And that’s something we ultimately have to form for ourselves.

Beyond the “Exact Sciences”

In areas like the physical sciences we’re used to the idea of being able to develop broad theories that can do things like make quantitative predictions. But there are many areas—for example in the biological, human and social sciences—that have tended to operate in much less formal ways, and where things like long chains of successful theoretical inferences are largely unheard of.

So might AI change that? There seem to be some interesting possibilities, particularly around the new kinds of “measurements” that AI enables. “How similar are those artworks?” “How close are the morphologies of those organisms?” “How different are those myths?” These are questions that in the past one mostly had to address by writing an essay. But now AI potentially gives us a path to make such things more definite—and in some sense quantitative.

Typically the key idea is to figure out how to take “unstructured raw data” and extract “meaningful features” from it that can be handled in formal, structured ways. And the main thing that makes this possible is that we have AIs that have been trained on large corpora that reflect “what’s typical in our world”—and which have in effect formed definite internal representations of the world, in terms of which things can for example be described (as we did above) by lists of numbers.

What do those numbers mean? At the outset we typically have no idea; they’re just the output of some neural net encoder. But what’s important is that they’re definite, and repeatable. Given the same input data, one will always get the same numbers. And, what’s more, it’s typical that when data “seems similar” to us, it’ll tend to be assigned nearby numbers.

In an area like physical science, we expect to build specific measuring devices that measure quantities we “know how to interpret”. But AI is much more of a black box: something is being measured, but at least at the outset we don’t necessarily have any interpretation of it. Sometimes we’ll be able to do training that associates some description we know, so that we’ll get at least a rough interpretation (as in a case like sentiment analysis). But often we won’t.

(And it has to be said that something similar can happen even in physical science. Let’s say we test whether one material scratches the surface of another. Presumably we can interpret that as some kind of hardness of the material, but really it’s just a measurement, that becomes significant if we can successfully associate it with other things.)

One thing that’s particularly notable about “AI measurements” is how they can potentially pick out “small signals” from large volumes of unstructured data. We’re used to having methods like statistics to do similar things on structured, numerical data. But it’s a different story to ask from billions of webpages whether, say, kids who like science typically prefer cats or dogs.

But given an “AI measurement” what can we expect to do with it? None of this is very clear yet, but it seems at least possible that we can start to find formal relationships. Perhaps it will be a quantitative relationship involving numbers; perhaps it will be better represented by a program that describes a computational process by which one measurement leads to others.

It’s been common for some time in areas like quantitative finance to find relationships between what amount to simple forms of “AI measurements”—and to be concerned mainly with whether they work, rather than why they work, or how one might narratively describe them.

In a sense it seems rather unsatisfactory to try to build science on “black-box” AI measurements that one can’t interpret. But at some level this is just an accelerated version of what we often do, say with everyday language. We’re exposed to some new observation or measurement. And eventually we invent words to describe it (“it looks like a fractal”, etc.). And then we can start “reasoning in terms of it”, etc.

But AI measurements are potentially a much richer source of formalizable material. But how should we do that formalization? Computational language seems to be key. And indeed we already have examples in the Wolfram Language—where functions like ImageIdentity or TextCases (or, for that matter, LLMFunction) can effectively make “AI measurements”, but then we can take their results, and work symbolically with them.

In physical science we often imagine that we’re working only with “objective measurements” (though my recent “observer theory” implies that actually our nature as observers is crucial even). But AI measurements seem to have a certain immediate “subjectivity”—and indeed their details (say, associated with the particulars of a neural net encoder) will be different for every different AI we use. But what’s important is that if the AI is trained on very large amounts of human experience, there’ll be a certain robustness to it. In a sense we can view many AI measurements as being like the output of a “societal observer”—that uses something like the whole mass of human experience, and in doing so gains a certain “centrality” and “inertia”.

What kind of science can we expect to build on the basis of what a “societal observer” measures? For the most part, we don’t yet know. There’s some reason to think that (as in the case of physics and metamathematics) such measurements might tap into pockets of computational reducibility. And if that’s the case, we can expect that we’ll be able to start doing things like making predictions—albeit perhaps only for the results of “AI measurements” which we’ll find hard to interpret. But by connecting such AI measurements to computational language, there seems to be the potential to start constructing “formalized science” in places where it’s never been possible before—and in doing so, to extend the domain of what we might call “exact sciences”.

(By the way, another promising application of modern AIs is in setting up “repeatable personas”: entities that effectively behave like humans with certain characteristics, but on which large-scale repeatable experiments of the kind typical in physical science can be done.)

So… Can AI Solve Science?

At the outset, one might be surprised that science is even possible. Why is it that there is regularity that we can identify in the world that allows us to form “scientific narratives”? Indeed, we now know from things like the concept of the ruliad that computational irreducibility is inevitably ubiquitous—and with it fundamental irregularity and unpredictability. But it turns out that the very presence of computational irreducibility necessarily implies that there must be pockets of computational reducibility, where at least certain things are regular and predictable. And it is within these pockets of reducibility that science fundamentally lives—and indeed that we try to operate and engage with the world.

So how does this relate to AI? Well, the whole story of things like trained neural nets that we’ve discussed here is a story of leveraging computational reducibility, and in particular computational reducibility that’s somehow aligned with what human minds also use. In the past the main way to capture—and capitalize on—computational reducibility was to develop formal ways to describe things, typically using mathematics and mathematical formulas. AI in effect provides a new way to make use of computational reducibility. Normally there’s no human-level narrative to how it works; it’s just that somehow within a trained neural net we manage to capture certain regularities that allow us, for example, to make certain predictions.

In a sense the predictions tend to be very “human style”, often looking “roughly right” to us, even though at the level of precise formal detail they’re not quite right. And fundamentally they rely on computational reducibility—and when computational irreducibility is present they more or less inevitably fail. In a sense, the AI is doing “shallow computation”, but when there’s computational irreducibility one needs irreducible, deep computation to work out what will happen.

And there are plenty of places—even in working with traditional mathematical structures—where what AI does won’t be sufficient for what we expect to get out of science. But there are also places where “AI-style science” can make progress even when traditional methods cannot. If one’s doing something like solving a single equation (say, ODE) precisely, AI probably won’t be the best tool. But if one’s got a big collection of equations (say for something like robotics) AI may successfully be able to give a useful “rough estimate” of what will happen, even when traditional methods would get utterly bogged down in details.

It’s a general feature of machine learning—and AI—techniques that they can be very useful if an approximate (“80%”) answer is good enough. But they tend to fail when one needs something more “precise” and “perfect”. And there are quite a few workflows in science (and probably more that can be identified) where this is exactly what one needs. “Pick out candidate cases for something”. “Identify a feature that might important”. “Suggest a possible question to explore”.

There are clear limitations, though, particularly whenever there’s computational irreducibility. In a sense the typical AI approach to science doesn’t involve explicitly “formalizing things”. But in many areas of science formalization is precisely what’s been most valuable, and what’s allowed towers of results to be obtained. And in recent times we have the powerful new idea of formalizing things computationally—and in particular in using computational language to do this.

And given such a computational formalization, we’re able to start doing irreducible computations that let us reach discoveries we have no way to anticipate. We can, for example, enumerate possible computational systems or processes, and see “fundamental surprises”. In typical AI there’s randomness that gives us a certain degree of “originality” in our exploration. But it’s of a fundamentally lower level than we can reach with actual irreducible computations.

So what should we expect for AI in science going forward? We’ve got in a sense a new—and rather human-like—way of leveraging computational reducibility. It’s a new tool for doing science, destined to have many practical uses. In terms of fundamental potential for discovery, though, it pales in comparison to what we can build from the computational paradigm, and from irreducible computations that we do. But probably what will give us the greatest opportunity to move science forward is to combine the strengths of AI and of the formal computational paradigm. Which, yes, is part of what we’ve been vigorously pursuing in recent years with the Wolfram Language and its connections to machine learning and now LLMs.

Notes

My goal here has been to outline my current thinking about the fundamental potential (and limitations) of AI in science—developing my ideas by using the Wolfram Language and its AI capabilities to do various simple experiments. I view what I’ve done here as just a beginning. Essentially every experiment could, for example, be done in much more detail, and with much more analysis. (And just click any image to get the Wolfram Language that made it, so you can repeat or extend it.)

“AI in science” is a hot topic these days in the world at large, and I am surely aware only of a small part of everything that’s been done. My own emphasis has been on trying to “do the obvious experiments” and trying to piece together for myself the “big picture” of what’s going on. I should emphasize that there’ve been a regular stream of outstanding and impressive “engineering innovations” in AI in recent times, and I won’t be at all surprised if experiments that haven’t worked well for me could be dramatically improved by future such innovations, conceivably even changing my “big-picture” conclusions from them.

I must also offer an apology. While I’ve been exposed—though often basically just “through the grapevine”—to lots of things being done on “AI in science”, especially over the past year, I haven’t made any serious attempt to systematically study the literature of the field, or trace its history and the provenance of ideas in it. So I must leave it to others to make connections between what I’ve done here and what other people may (or may not) have done elsewhere. It’d be fascinating to do a serious analysis of the history of work on AI in science, but it’s not something I’ve had a chance to do.

In my efforts here I have been greatly assisted by Wolfram Institute fellows Richard Assar (“Ruliad Fellow”) and Nik Murzin (“Fourmilab Fellow”). I’m also grateful to the many people who I’ve talked to—or heard from—about AI in science (and related topics) in recent times, including Giulio Alessandrini, Mohammed AlQuraishi, Brian Frezza, Roger Germundsson, George Morgan, Michael Trott and Christopher Wolfram.

The Story Continues: Announcing Version 14 of Wolfram Language and Mathematica

Stephen Wolfram — Tue, 09 Jan 2024 22:33:01 +0000

Version 14.0 of Wolfram Language and Mathematica is available immediately both on the desktop and in the cloud. See also more detailed information on Version 13.1, Version 13.2 and Version 13.3.

Building Something Greater and Greater… for 35 Years and Counting

Today we celebrate a new waypoint on our journey of nearly four decades with the release of Version 14.0 of Wolfram Language and Mathematica. Over the two years since we released Version 13.0 we’ve been steadily delivering the fruits of our research and development in .1 releases every six months. Today we’re aggregating these—and more—into Version 14.0.

It’s been more than 35 years now since we released Version 1.0. And all those years we’ve been continuing to build a taller and taller tower of capabilities, progressively expanding the scope of our vision and the breadth of our computational coverage of the world:

Version 1.0 had 554 built-in functions; in Version 14.0 there are 6602. And behind each of those functions is a story. Sometimes it’s a story of creating a superalgorithm that encapsulates decades of algorithmic development. Sometimes it’s a story of painstakingly curating data that’s never been assembled before. Sometimes it’s a story of drilling down to the essence of something to invent new approaches and new functions that can capture it.

And from all these pieces we’ve been steadily building the coherent whole that is today’s Wolfram Language. In the arc of intellectual history it defines a broad, new, computational paradigm for formalizing the world. And at a practical level it provides a superpower for implementing computational thinking—and enabling “computational X” for all fields X.

To us it’s profoundly satisfying to see what has been done over the past three decades with everything we’ve built so far. So many discoveries, so many inventions, so much achieved, so much learned. And seeing this helps drive forward our efforts to tackle still more, and to continue to push every boundary we can with our R&D, and to deliver the results in new versions of our system.

Our R&D portfolio is broad. From projects that get completed within months of their conception, to projects that rely on years (and sometimes even decades) of systematic development. And key to everything we do is leveraging what we have already done—often taking what in earlier years was a pinnacle of technical achievement, and now using it as a routine building block to reach a level that could barely even be imagined before. And beyond practical technology, we’re also continually going further and further in leveraging what’s now the vast conceptual framework that we’ve been building all these years—and progressively encapsulating it in the design of the Wolfram Language.

We’ve worked hard all these years not only to create ideas and technology, but also to craft a practical and sustainable ecosystem in which we can systematically do this now and into the long-term future. And we continue to innovate in these areas, broadening the delivery of what we’ve built in new and different ways, and through new and different channels. And in the past five years we’ve also been able to open up our core design process to the world—regularly livestreaming what we’re doing in a uniquely open way.

And indeed over the past several years the seeds of essentially everything we’re delivering today in Version 14.0 has been openly shared with the world, and represents an achievement not only for our internal teams but also for the many people who have participated in and commented on our livestreams.

Part of what Version 14.0 is about is continuing to expand the domain of our computational language, and our computational formalization of the world. But Version 14.0 is also about streamlining and polishing the functionality we’ve already defined. Throughout the system there are things we’ve made more efficient, more robust and more convenient. And, yes, in complex software, bugs of many kinds are a theoretical and practical inevitability. And in Version 14.0 we’ve fixed nearly 10,000 bugs, the majority found by our increasingly sophisticated internal software testing methods.

Now We Need to Tell the World

Even after all the work we’ve put into the Wolfram Language over the past several decades, there’s still yet another challenge: how to let people know just what the Wolfram Language can do. Back when we released Version 1.0 I was able to write a book of manageable size that could pretty much explain the whole system. But for Version 14.0—with all the functionality it contains—one would need a book with perhaps 200,000 pages.

And at this point nobody (even me!) immediately knows everything the Wolfram Language does. Of course one of our great achievements has been to maintain across all that functionality a tightly coherent and consistent design that results in there ultimately being only a small set of fundamental principles to learn. But at the vast scale of the Wolfram Language as it exists today, knowing what’s possible—and what can now be formulated in computational terms—is inevitably very challenging. And all too often when I show people what’s possible, I’ll get the response “I had no idea the Wolfram Language could do that!”

So in the past few years we’ve put increasing emphasis into building large-scale mechanisms to explain the Wolfram Language to people. It begins at a very fine-grained level, with “just-in-time information” provided, for example, through suggestions made when you type. Then for each function (or other construct in the language) there are pages that explain the function, with extensive examples. And now, increasingly, we’re adding “just-in-time learning material” that leverages the concreteness of the functions to provide self-contained explanations of the broader context of what they do.

By the way, in modern times we need to explain the Wolfram Language not just to humans, but also to AIs—and our very extensive documentation and examples have proved extremely valuable in training LLMs to use the Wolfram Language. And for AIs we’re providing a variety of tools—like immediate computable access to documentation, and computable error handling. And with our Chat Notebook technology there’s also a new “on ramp” for creating Wolfram Language code from linguistic (or visual, etc.) input.

But what about the bigger picture of the Wolfram Language? For both people and AIs it’s important to be able to explain things at a higher level, and we’ve been doing more and more in this direction. For more than 30 years we’ve had “guide pages” that summarize specific functionality in particular areas. Now we’re adding “core area pages” that give a broader picture of large areas of functionality—each one in effect covering what might otherwise be a whole product on its own, if it wasn’t just an integrated part of the Wolfram Language:

But we’re going even much further, building whole courses and books that provide modern hands-on Wolfram-Language-enabled introductions to a broad range of areas. We’ve now covered the material of many standard college courses (and quite a lot besides), in a new and very effective “computational” way, that allows immediate, practical engagement with concepts:

All these courses involve not only lectures and notebooks but also auto-graded exercises, as well as official certifications. And we have a regular calendar of everyone-gets-together-at-the-same-time instructor-led peer Study Groups about these courses. And, yes, our Wolfram U operation is now emerging as a significant educational entity, with many thousands of students at any given time.

In addition to whole courses, we have “miniseries” of lectures about specific topics:

And we also have courses—and books—about the Wolfram Language itself, like my Elementary Introduction to the Wolfram Language, which came out in a third edition this year (and has an associated course, online version, etc.):

In a somewhat different direction, we’ve expanded our Wolfram Summer School to add a Wolfram Winter School, and we’ve greatly expanded our Wolfram High School Summer Research Program, adding year-round programs, middle-school programs, etc.—including the new “Computational Adventures” weekly activity program.

And then there’s livestreaming. We’ve been doing weekly “R&D livestreams” with our development team (and sometimes also external guests). And I myself have also been doing a lot of livestreaming (232 hours of it in 2023 alone)—some of it design reviews of Wolfram Language functionality, and some of it answering questions, technical and other.

The list of ways we’re getting the word out about the Wolfram Language goes on. There’s Wolfram Community, that’s full of interesting contributions, and has ever-increasing readership. There are sites like Wolfram Challenges. There are our Wolfram Technology Conferences. And lots more.

We’ve put immense effort into building the whole Wolfram technology stack over the past four decades. And even as we continue to aggressively build it, we’re putting more and more effort into telling the world about just what’s in it, and helping people (and AIs) to make the most effective use of it. But in a sense, everything we’re doing is just a seed for what the wider community of Wolfram Language users are doing, and can do. Spreading the power of the Wolfram Language to more and more people and areas.

The LLMs Have Landed

The machine learning superfunctions Classify and Predict first appeared in Wolfram Language in 2014 (Version 10). By the next year there were starting to be functions like ImageIdentify and LanguageIdentify, and within a couple of years we’d introduced our whole neural net framework and Neural Net Repository. Included in that were a variety of neural nets for language modeling, that allowed us to build out functions like SpeechRecognize and an experimental version of FindTextualAnswer. But—like everyone else—we were taken by surprise at the end of 2022 by ChatGPT and its remarkable capabilities.

Very quickly we realized that a major new use case—and market—had arrived for Wolfram|Alpha and Wolfram Language. For now it was not only humans who’d need the tools we’d built; it was also AIs. By March 2023 we’d worked with OpenAI to use our Wolfram Cloud technology to deliver a plugin to ChatGPT that allows it to call Wolfram|Alpha and Wolfram Language. LLMs like ChatGPT provide remarkable new capabilities in reproducing human language, basic human thinking and general commonsense knowledge. But—like unaided humans—they’re not set up to deal with detailed computation or precise knowledge. For that, like humans, they have to use formalism and tools. And the remarkable thing is that the formalism and tools we’ve built in Wolfram Language (and Wolfram|Alpha) are basically a broad, perfect fit for what they need.

We created the Wolfram Language to provide a bridge from what humans think about to what computation can express and implement. And now that’s what the AIs can use as well. The Wolfram Language provides a medium not only for humans to “think computationally” but also for AIs to do so. And we’ve been steadily doing the engineering to let AIs call on Wolfram Language as easily as possible.

But in addition to LLMs using Wolfram Language, there’s also now the possibility of Wolfram Language using LLMs. And already in June 2023 (Version 13.3) we released a major collection of LLM-based capabilities in Wolfram Language. One category is LLM functions, that effectively use LLMs as “internal algorithms” for operations in Wolfram Language:

In typical Wolfram Language fashion, we have a symbolic representation for LLMs: LLMConfiguration[…] represents an LLM with its various parameters, promptings, etc. And in the past few months we’ve been steadily adding connections to the full range of popular LLMs, making Wolfram Language a unique hub not only for LLM usage, but also for studying the performance—and science—of LLMs.

You can define your own LLM functions in Wolfram Language. But there’s also the Wolfram Prompt Repository that plays a similar role for LLM functions as the Wolfram Function Repository does for ordinary Wolfram Language functions. There’s a public Prompt Repository that so far has several hundred curated prompts. But it’s also possible for anyone to post their prompts in the Wolfram Cloud and make them publicly (or privately) accessible. The prompts can define personas (“talk like a [stereotypical] pirate”). They can define AI-oriented functions (“write it with emoji”). And they can define modifiers that affect the form of output (“haiku style”).

In addition to calling LLMs “programmatically” within Wolfram Language, there’s the new concept (first introduced in Version 13.3) of “Chat Notebooks”. Chat Notebooks represent a new kind of user interface, that combines the graphical, computational and document features of traditional Wolfram Notebooks with the new linguistic interface capabilities brought to us by LLMs.

The basic idea of a Chat Notebook—as introduced in Version 13.3, and now extended in Version 14.0—is that you can have “chat cells” (requested by typing ‘) whose content gets sent not to the Wolfram kernel, but instead to an LLM:

You can use “function prompts”—say from the Wolfram Prompt Repository—directly in a Chat Notebook:

And as of Version 14.0 you can also knit Wolfram Language computations directly into your “conversation” with the LLM:

(You type \ to insert Wolfram Language, very much like the way you can use <* … *> to insert Wolfram Language into external evaluation cells.)

One thing about Chat Notebooks is that—as their name suggests—they really are centered around “chatting”, and around having a sequential interaction with an LLM. In an ordinary notebook, it doesn’t matter where in the notebook each Wolfram Language evaluation is requested; all that’s relevant is the order in which the Wolfram kernel does the evaluations. But in a Chat Notebook the “LLM evaluations” are always part of a “chat” that’s explicitly laid out in the notebook.

A key part of Chat Notebooks is the concept of a chat block: type ~ and you get a separator in the notebook that “starts a new chat”:

Chat Notebooks—with all their typical Wolfram Notebook editing, structuring, automation, etc. capabilities—are very powerful just as “LLM interfaces”. But there’s another dimension as well, enabled by LLMs being able to call Wolfram Language as a tool.

At one level, Chat Notebooks provide an “on ramp” for using Wolfram Language. Wolfram|Alpha—and even more so, Wolfram|Alpha Notebook Edition—let you ask questions in natural language, then have the questions translated into Wolfram Language, and answers computed. But in Chat Notebooks you can go beyond asking specific questions. Instead, through the LLM, you can just “start chatting” about what you want to do, then have Wolfram Language code generated, and executed:

The workflow is typically as follows. First, you have to conceptualize in computational terms what you want. (And, yes, that step requires computational thinking—which is a very important skill that too few people have so far learned.) Then you tell the LLM what you want, and it’ll try to write Wolfram Language code to achieve it. It’ll typically run the code for you (but you can also always do it yourself)—and you can see whether you got what you wanted. But what’s crucial is that Wolfram Language is intended to be read not only by computers but also by humans. And particularly since LLMs actually usually seem to manage to write pretty good Wolfram Language code, you can expect to read what they wrote, and see if it’s what you wanted. If it is, you can take that code, and use it as a “solid building block” for whatever larger system you might be trying to set up. Otherwise, you can either fix it yourself, or try chatting with the LLM to get it to do it.

One of the things we see in the example above is the LLM—within the Chat Notebook—making a “tool call”, here to a Wolfram Language evaluator. In the Wolfram Language there’s now a whole mechanism for defining tools for LLMs—with each tool being represented by an LLMTool symbolic object. In Version 14.0 there’s an experimental version of the new Wolfram LLM Tool Repository with some predefined tools:

In a default Chat Notebook, the LLM has access to some default tools, which include not only the Wolfram Language evaluator, but also things like Wolfram documentation search and Wolfram|Alpha query. And it’s common to see the LLM go back and forth trying to write “code that works”, and for example sometimes having to “resort” (much like humans do) to reading the documentation.

Something that’s new in Version 14.0 is experimental access to multimodal LLMs that can take images as well as text as input. And when this capability is enabled, it allows the LLM to “look at pictures from the code it generated”, see if they’re what was asked for, and potentially correct itself:

The deep integration of images into Wolfram Language—and Wolfram Notebooks—yields all sorts of possibilities for multimodal LLMs. Here we’re giving a plot as an image and asking the LLM how to reproduce it:

Another direction for multimodal LLMs is to take data (in the hundreds of formats accepted by Wolfram Language) and use the LLM to guide its visualization and analysis in the Wolfram Language. Here’s an example that starts from a file data.csv in the current directory on your computer:

One thing that’s very nice about using Wolfram Language directly is that everything you do (well, unless you use RandomInteger, etc.) is completely reproducible; do the same computation twice and you’ll get the same result. That’s not true with LLMs (at least right now). And so when one uses LLMs it feels like something more ephemeral and fleeting than using Wolfram Language. One has to grab any good results one gets—because one might never be able to reproduce them. Yes, it’s very helpful that one can store everything in a Chat Notebook, even if one can’t rerun it and get the same results. But the more “permanent” use of LLM results tends to be “offline”. Use an LLM “up front” to figure something out, then just use the result it gave.

One unexpected application of LLMs for us has been in suggesting names of functions. With the LLM’s “experience” of what people talk about, it’s in a good position to suggest functions that people might find useful. And, yes, when it writes code it has a habit of hallucinating such functions. But in Version 14.0 we’ve actually added one function—DigitSum—that was suggested to us by LLMs. And in a similar vein, we can expect LLMs to be useful in making connections to external databases, functions, etc. The LLM “reads the documentation”, and tries to write Wolfram Language “glue” code—which then can be reviewed, checked, etc., and if it’s right, can be used henceforth.

Then there’s data curation, which is a field that—through Wolfram|Alpha and many of our other efforts—we’ve become extremely expert at over the past couple of decades. How much can LLMs help with that? They certainly don’t “solve the whole problem”, but integrating them with the tools we already have has allowed us over the past year to speed up some of our data curation pipelines by factors of two or more.

If we look at the whole stack of technology and content that’s in the modern Wolfram Language, the overwhelming majority of it isn’t helped by LLMs, and isn’t likely to be. But there are many—sometimes unexpected—corners where LLMs can dramatically improve heuristics or otherwise solve problems. And in Version 14.0 there are starting to be a wide variety of “LLM inside” functions.

An example is TextSummarize, which is a function we’ve considered adding for many versions—but now, thanks to LLMs, can finally implement to a useful level:

The main LLMs that we’re using right now are based on external services. But we’re building capabilities to allow us to run LLMs in local Wolfram Language installations as soon as that’s technically feasible. And one capability that’s actually part of our mainline machine learning effort is NetExternalObject—a way of representing symbolically an externally defined neural net that can be run inside Wolfram Language. NetExternalObject allows you, for example, to take any network in ONNX form and effectively treat it as a component in a Wolfram Language neural net. Here’s a network for image depth estimation—that we’re here importing from an external repository (though in this case there’s actually a similar network already in the Wolfram Neural Net Repository):

Now we can apply this imported network to an image that’s been encoded with our built-in image encoder—then we’re taking the result and visualizing it:

It’s often very convenient to be able to run networks locally, but it can sometimes take quite high-end hardware to do so. For example, there’s now a function in the Wolfram Function Repository that does image synthesis entirely locally—but to run it, you do need a GPU with at least 8 GB of VRAM:

By the way, based on LLM principles (and ideas like transformers) there’ve been other related advances in machine learning that have been strengthening a whole range of Wolfram Language areas—with one example being image segmentation, where ImageSegmentationComponents now provides robust “content-sensitive” segmentation:

Still Going Strong on Calculus

When Mathematica 1.0 was released in 1988, it was a “wow” that, yes, now one could routinely do integrals symbolically by computer. And it wasn’t long before we got to the point—first with indefinite integrals, and later with definite integrals—where what’s now the Wolfram Language could do integrals better than any human. So did that mean we were “finished” with calculus? Well, no. First there were differential equations, and partial differential equations. And it took a decade to get symbolic ODEs to a beyond-human level. And with symbolic PDEs it took until just a few years ago. Somewhere along the way we built out discrete calculus, asymptotic expansions and integral transforms. And we also implemented lots of specific features needed for applications like statistics, probability, signal processing and control theory. But even now there are still frontiers.

And in Version 14 there are significant advances around calculus. One category concerns the structure of answers. Yes, one can have a formula that correctly represents the solution to a differential equation. But is it in the best, simplest or most useful form? Well, in Version 14 we’ve worked hard to make sure it is—often dramatically reducing the size of expressions that get generated.

Another advance has to do with expanding the range of “pre-packaged” calculus operations. We’ve been able to do derivatives ever since Version 1.0. But in Version 14 we’ve added implicit differentiation. And, yes, one can give a basic definition for this easily enough using ordinary differentiation and equation solving. But by adding an explicit ImplicitD we’re packaging all that up—and handling the tricky corner cases—so that it becomes routine to use implicit differentiation wherever you want:

Another category of pre-packaged calculus operations new in Version 14 are ones for vector-based integration. These were always possible to do in a “do-it-yourself” mode. But in Version 14 they are now streamlined built-in functions—that, by the way, also cover corner cases, etc. And what made them possible is actually a development in another area: our decade-long project to add geometric computation to Wolfram Language—which gave us a natural way to describe geometric constructs such as curves and surfaces:

Related functionality new in Version 14 is ContourIntegrate:

Functions like ContourIntegrate just “get the answer”. But if one’s learning or exploring calculus it’s often also useful to be able to do things in a more step-by-step way. In Version 14 you can start with an inactive integral

and explicitly do operations like changing variables:

Sometimes actual answers get expressed in inactive form, particularly as infinite sums:

And now in Version 14 the function TruncateSum lets you take such a sum and generate a truncated “approximation”:

Functions like D and Integrate—as well as LineIntegrate and SurfaceIntegrate—are, in a sense, “classic calculus”, taught and used for more than three centuries. But in Version 14 we also support what we can think of as “emerging” calculus operations, like fractional differentiation:

Core Language

What are the primitives from which we can best build our conception of computation? That’s at some level the question I’ve been asking for more than four decades, and what’s determined the functions and structures at the core of the Wolfram Language.

And as the years go by, and we see more and more of what’s possible, we recognize and invent new primitives that will be useful. And, yes, the world—and the ways people interact with computers—change too, opening up new possibilities and bringing new understanding of things. Oh, and this year there are LLMs which can “get the intellectual sense of the world” and suggest new functions that can fit into the framework we’ve created with the Wolfram Language. (And, by the way, there’ve also been lots of great suggestions made by the audiences of our design review livestreams.)

One new construct added in Version 13.1—and that I personally have found very useful—is Threaded. When a function is listable—as Plus is—the top levels of lists get combined:

But sometimes you want one list to be “threaded into” the other at the lowest level, not the highest. And now there’s a way to specify that, using Threaded:

In a sense, Threaded is part of a new wave of symbolic constructs that have “ambient effects” on lists. One very simple example (introduced in 2015) is Nothing:

Another, introduced in 2020, is Splice:

An old chestnut of Wolfram Language design concerns the way infinite evaluation loops are handled. And in Version 13.2 we introduced the symbolic construct TerminatedEvaluation to provide better definition of how out-of-control evaluations have been terminated:

In a curious connection, in the computational representation of physics in our recent Physics Project, the direct analog of nonterminating evaluations are what make possible the seemingly unending universe in which we live.

But what is actually going on “inside an evaluation”, terminating or not? I’ve always wanted a good representation of this. And in fact back in Version 2.0 we introduced Trace for this purpose:

But just how much detail of what the evaluator does should one show? Back in Version 2.0 we introduced the option TraceOriginal that traces every path followed by the evaluator:

But often this is way too much. And in Version 14.0 we’ve introduced the new setting TraceOriginal→Automatic, which doesn’t include in its output evaluations that don’t do anything:

This may seem pedantic, but when one has an expression of any substantial size, it’s a crucial piece of pruning. So, for example, here’s a graphical representation of a simple arithmetic evaluation, with TraceOriginal→True:

And here’s the corresponding “pruned” version, with TraceOriginal→Automatic:

(And, yes, the structures of these graphs are closely related to things like the causal graphs we construct in our Physics Project.)

In the effort to add computational primitives to the Wolfram Language, two new entrants in Version 14.0 are Comap and ComapApply. The function Map takes a function f and “maps it” over a list:

Comap does the “mathematically co-” version of this, taking a list of functions and “comapping” them onto a single argument:

Why is this useful? As an example, one might want to apply three different statistical functions to a single list. And now it’s easy to do that, using Comap:

By the way, as with Map, there’s also an operator form for Comap:

Comap works well when the functions it’s dealing with take just one argument. If one has functions that take multiple arguments, ComapApply is what one typically wants:

Talking of “co-like” functions, a new function added in Version 13.2 is PositionSmallest. Min gives the smallest element in a list; PositionSmallest instead says where the smallest elements are:

One of the important objectives in the Wolfram Language is to have as much as possible “just work”. When we released Version 1.0 strings could be assumed just to contain ordinary ASCII characters, or perhaps to have an external character encoding defined. And, yes, it could be messy not to know “within the string itself” what characters were supposed to be there. And by the time of Version 3.0 in 1996 we’d become contributors to, and early adopters of, Unicode, which provided a standard encoding for “16-bits’-worth” of characters. And for many years this served us well. But in time—and particularly with the growth of emoji—16 bits wasn’t enough to encode all the characters people wanted to use. So a few years ago we began rolling out support for 32-bit Unicode, and in Version 13.1 we integrated it into notebooks—in effect making strings something much richer than before:

And, yes, you can use Unicode everywhere now:

Video as a Fundamental Object

Back when Version 1.0 was released, a megabyte was a lot of memory. But 35 years later we routinely deal with gigabytes. And one of the things that makes practical is computation with video. We first introduced Video experimentally in Version 12.1 in 2020. And over the past three years we’ve been systematically broadening and strengthening our ability to deal with video in Wolfram Language. Probably the single most important advance is that things around video now—as much as possible—“just work”, without “creaking” under the strain of handling such large amounts of data.

We can directly capture video into notebooks, and we can robustly play video anywhere within a notebook. We’ve also added options for where to store the video so that it’s conveniently accessible to you and anyone else you want to give access to it.

There’s lots of complexity in the encoding of video—and we now robustly and transparently support more than 500 codecs. We also do lots of convenient things automatically, like rotating portrait-mode videos—and being able to apply image processing operations like ImageCrop across whole videos. In every version, we’ve been further optimizing the speed of some video operation or another.

But a particularly big focus has been on video generators: programmatic ways to produce videos and animations. One basic example is AnimationVideo, which produces the same kind of output as Animate, but as a Video object that can either be displayed directly in a notebook, or exported in MP4 or some other format:

AnimationVideo is based on computing each frame in a video by evaluating an expression. Another class of video generators take an existing visual construct, and simply “tour” it. TourVideo “tours” images, graphics and geo graphics; Tour3DVideo (new in Version 14.0) tours 3D geometry:

A very powerful capability in Wolfram Language is being able to apply arbitrary functions to videos. One example of how this can be done is VideoFrameMap, which maps a function across frames of a video, and which was made efficient in Version 13.2:

And although Wolfram Language isn’t intended as an interactive video editing system, we’ve made sure that it’s possible to do streamlined programmatic video editing in the language, and for example in Version 14.0 we’ve added things like transition effects in VideoJoin and timed overlays in OverlayVideo.

So Much Got Faster, Stronger, Sleeker

With every new version of Wolfram Language we add new capabilities to extend yet further the domain of the language. But we also put a lot of effort into something less immediately visible: making existing capabilities faster, stronger and sleeker.

And in Version 14 two areas where we can see some examples of all these are dates and quantities. We introduced the notion of symbolic dates (DateObject, etc.) nearly a decade ago. And over the years since then we’ve built many things on this structure. And in the process of doing this it’s become clear that there are certain flows and paths that are particularly common and convenient. At the beginning what mattered most was just to make sure that the relevant functionality existed. But over time we’ve been able to see what should be streamlined and optimized, and we’ve steadily been doing that.

In addition, as we’ve worked towards new and different applications, we’ve seen “corners” that need to be filled in. So, for example, astronomy is an area we’ve significantly developed in Version 14, and supporting astronomy has required adding several new “high-precision” time capabilities, such as the TimeSystem option, as well as new astronomy-oriented calendar systems. Another example concerns date arithmetic. What should happen if you want to add a month to January 30? Where should you land? Different kinds of business applications and contracts make different assumptions—and so we added a Method option to functions like DatePlus to handle this. Meanwhile, having realized that date arithmetic is involved in the “inner loop” of certain computations, we optimized it—achieving a more than 100x speedup in Version 14.0.

Wolfram|Alpha has been able to deal with units ever since it was first launched in 2009—now more than 10,000 of them. And in 2012 we introduced Quantity to represent quantities with units in the Wolfram Language. And over the past decade we’ve been steadily smoothing out a whole series of complicated gotchas and issues with units. For example, what does 100°C + 20°C mean? Well, the 20°C isn’t really the same kind of thing as the 100°C. And now in Wolfram Language we have a systematic way to handle this, by distinguishing temperature and temperature difference units—so that we now write 100°C + .

At first our priority with Quantity was to get it working as broadly as possible, and to integrate it as widely as possible into computations, visualizations, etc. across the system. But as its capabilities have expanded, so have its uses, repeatedly driving the need to optimize its operation for particular common cases. And indeed between Version 13 and Version 14 we’ve dramatically sped up many things related to Quantity, often by factors of 1000 or more.

Talking of speedups, another example—made possible by new algorithms operating on multithreaded CPUs—concerns polynomials. We’ve worked with polynomials in Wolfram Language since Version 1, but in Version 13.2 there was a dramatic speedup of up to 1000x on operations like polynomial factoring.

In addition, a new algorithm in Version 14.0 dramatically speeds up numerical solutions to polynomial and transcendental equations—and, together with the new MaxRoots options, allows us, for example, to pick off a few roots from a degree-one-million polynomial

or to find roots of a transcendental equation that we could not even attempt before without pre-specifying bounds on their values:

Another “old” piece of functionality with recent enhancement concerns mathematical functions. Ever since Version 1.0 we’ve set up mathematical functions so that they can be computed to arbitrary precision:

But in recent versions we’ve wanted to be “more precise about precision”, and to be able to rigorously compute just what range of outputs are possible given the range of values provided as input:

But every function for which we do this effectively requires a new theorem, and we’ve been steadily increasing the number of functions covered—now more than 130—so that this “just works” when you need to use it in a computation.

The Tree Story Continues

Trees are useful. We first introduced them as basic objects in the Wolfram Language only in Version 12.3. But now that they’re there, we’re discovering more and more places they can be used. And to support that, we’ve been adding more and more capabilities to them.

One area that’s advanced significantly since Version 13 is the rendering of trees. We tightened up the general graphic design, but, more importantly, we introduced many new options for how rendering should be done.

For example, here’s a random tree where we’ve specified that for all nodes only 3 children should be explicitly displayed: the others are elided away:

Here we’re adding several options to define the rendering of the tree:

By default, the branches in trees are labeled with integers, just like parts in an expression. But in Version 13.1 we added support for named branches defined by associations:

Our original conception of trees was very centered around having elements one would explicitly address, and that could have “payloads” attached. But what became clear is that there were applications where all that mattered was the structure of the tree, not anything about its elements. So we added UnlabeledTree to create “pure trees”:

Trees are useful because many kinds of structures are basically trees. And since Version 13 we’ve added capabilities for converting trees to and from various kinds of structures. For example, here’s a simple Dataset object:

You can use ExpressionTree to convert this to a tree:

And TreeExpression to convert it back:

We’ve also added capabilities for converting to and from JSON and XML, as well as for representing file directory structures as trees:

Finite Fields

In Version 1.0 we had integers, rational numbers and real numbers. In Version 3.0 we added algebraic numbers (represented implicitly by Root)—and a dozen years later we added algebraic number fields and transcendental roots. For Version 14 we’ve now added another (long-awaited) “number-related” construct: finite fields.

Here’s our symbolic representation of the field of integers modulo 7:

And now here’s a specific element of that field

which we can immediately compute with:

But what’s really important about what we’ve done with finite fields is that we’ve fully integrated them into other functions in the system. So, for example, we can factor a polynomial whose coefficients are in a finite field:

We can also do things like find solutions to equations over finite fields. So here, for example, is a point on a Fermat curve over the finite field GF(17³):

And here is a power of a matrix with elements over the same finite field:

Going Off Planet: The Astro Story

A major new capability added since Version 13 is astro computation. It begins with being able to compute to high precision the positions of things like planets. Even knowing what one means by “position” is complicated, though—with lots of different coordinate systems to deal with. By default AstroPosition gives the position in the sky at the current time from your Here location:

But one can instead ask about a different coordinate system, like global galactic coordinates:

And now here’s a plot of the distance between Saturn and Jupiter over a 50-year period:

In direct analogy to GeoGraphics, we’ve added AstroGraphics, here showing a patch of sky around the current position of Saturn:

And this now shows the sequence of positions for Saturn over the course of a couple of years—yes, including retrograde motion:

There are many styling options for AstroGraphics. Here we’re adding a background of the “galactic sky”:

And here we’re including renderings for constellations (and, yes, we had an artist draw them):

Something specifically new in Version 14.0 has to do with extended handling of solar eclipses. We always try to deliver new functionality as fast as we can. But in this case there was a very specific deadline: the total solar eclipse visible from the US on April 8, 2024. We’ve had the ability to do global computations about solar eclipses for some time (actually since soon before the 2017 eclipse). But now we can also do detailed local computations right in the Wolfram Language.

So, for example, here’s a somewhat detailed overall map of the April 8, 2024, eclipse:

Now here’s a plot of the magnitude of the eclipse over a few hours, complete with a little “rampart” associated with the period of totality:

And here’s a map of the region of totality every minute just after the moment of maximum eclipse:

Millions of Species Become Computable

We first introduced computable data on biological organisms back when Wolfram|Alpha was released in 2009. But in Version 14—following several years of work—we’ve dramatically broadened and deepened the computable data we have about biological organisms.

So for example here’s how we can figure out what species have cheetahs as predators:

And here are pictures of these:

Here’s a map of countries where cheetahs have been seen (in the wild):

We now have data—curated from a great many sources—on more than a million species of animals, as well as most of the plants, fungi, bacteria, viruses and archaea that have been described. And for animals, for example, we have nearly 200 properties that are extensively filled in. Some are taxonomic properties:

Some are physical properties:

Some are genetic properties:

Some are ecological properties (yes, the cheetah is not the apex predator):

It’s useful to be able to get properties of individual species, but the real power of our curated computable data shows up when one does larger-scale analyses. Like here’s a plot of the lengths of genomes for organisms with the longest ones across our collection of organisms:

Or here’s a histogram of the genome lengths for organisms in the human gut microbiome:

And here’s a scatterplot of the lifespans of birds against their weights:

Following the idea that cheetahs aren’t apex predators, this is a graph of what’s “above” them in the food chain:

Chemical Computation

We began the process of introducing chemical computation into the Wolfram Language in Version 12.0, and by Version 13 we had good coverage of atoms, molecules, bonds and functional groups. Now in Version 14 we’ve added coverage of chemical formulas, amounts of chemicals—and chemical reactions.

Here’s a chemical formula, that basically just gives a “count of atoms”:

Now here are specific molecules with that formula:

Let’s pick one of these molecules:

Now in Version 14 we have a way to represent a certain quantity of molecules of a given type—here 1 gram of methylcyclopentane:

ChemicalConvert can convert to a different specification of quantity, here moles:

And here a count of molecules:

But now the bigger story is that in Version 14 we can represent not just individual types of molecules, and quantities of molecules, but also chemical reactions. Here we give a “sloppy” unbalanced representation of a reaction, and ReactionBalance gives us the balanced version:

And now we can extract the formulas for the reactants:

We can also give a chemical reaction in terms of molecules:

But with our symbolic representation of molecules and reactions, there’s now a big thing we can do: represent classes of reactions as “pattern reactions”, and work with them using the same kinds of concepts as we use in working with patterns for general expressions. So, for example, here’s a symbolic representation of the hydrohalogenation reaction:

Now we can apply this pattern reaction to particular molecules:

Here’s a more elaborate example, in this case entered using a SMARTS string:

Here we’re applying the reaction just once:

And now we’re doing it repeatedly

in this case generating longer and longer molecules (which in this case happen to be polypeptides):

The Knowledgebase Is Always Growing

Every minute of every day, new data is being added to the Wolfram Knowledgebase. Much of it is coming automatically from real-time feeds. But we also have a very large-scale ongoing curation effort with humans in the loop. We’ve built sophisticated (Wolfram Language) automation for our data curation pipeline over the years—and this year we’ve been able to increase efficiency in some areas by using LLM technology. But it’s hard to do curation right, and our long-term experience is that to do so ultimately requires human experts being in the loop, which we have.

So what’s new since Version 13.0? 291,842 new notable current and historical people; 264,467 music works; 118,538 music albums; 104,024 named stars; and so on. Sometimes the addition of an entity is driven by the new availability of reliable data; often it’s driven by the need to use that entity in some other piece of functionality (e.g. stars to render in AstroGraphics). But more than just adding entities there’s the issue of filling in values of properties of existing entities. And here again we’re always making progress, sometimes integrating newly available large-scale secondary data sources, and sometimes doing direct curation ourselves from primary sources.

A recent example where we needed to do direct curation was in data on alcoholic beverages. We have very extensive data on hundreds of thousands of types of foods and drinks. But none of our large-scale sources included data on alcoholic beverages. So that’s an area where we need to go to primary sources (in this case typically the original producers of products) and curate everything for ourselves.

So, for example, we can now ask for something like the distribution of flavors of different varieties of vodka (actually, personally, not being a consumer of such things, I had no idea vodka even had flavors…):

But beyond filling out entities and properties of existing types, we’ve also steadily been adding new entity types. One recent example is geological formations, 13,706 of them:

So now, for example, we can specify where T. rex have been found

and we can show those regions on a map:

Industrial-Strength Multidomain PDEs

PDEs are hard. It’s hard to solve them. And it’s hard to even specify what exactly you want to solve. But we’ve been on a multi-decade mission to “consumerize” PDEs and make them easier to work with. Many things go into this. You need to be able to easily specify elaborate geometries. You need to be able to easily define mathematically complicated boundary conditions. You need to have a streamlined way to set up the complicated equations that come out of underlying physics. Then you have to—as automatically as possible—do the sophisticated numerical analysis to efficiently solve the equations. But that’s not all. You also often need to visualize your solution, compute other things from it, or run optimizations of parameters over it.

It’s a deep use of what we’ve built with Wolfram Language—touching many parts of the system. And the result is something unique: a truly streamlined and integrated way to handle PDEs. One’s not dealing with some (usually very expensive) “just for PDEs” package; what we now have is a “consumerized” way to handle PDEs whenever they’re needed—for engineering, science, or whatever. And, yes, being able to connect machine learning, or image computation, or curated data, or data science, or real-time sensor feeds, or parallel computing, or, for that matter, Wolfram Notebooks, to PDEs just makes them so much more valuable.

We’ve had “basic, raw NDSolve” since 1991. But what’s taken decades to build is all the structure around that to let one conveniently set up—and efficiently solve—real-world PDEs, and connect them into everything else. It’s taken developing a whole tower of underlying algorithmic capabilities such as our more-flexible-and-integrated-than-ever-before industrial-strength computational geometry and finite element methods. But beyond that it’s taken creating a language for specifying real-world PDEs. And here the symbolic nature of the Wolfram Language—and our whole design framework—has made possible something very unique, that has allowed us to dramatically simplify and consumerize the use of PDEs.

It’s all about providing symbolic “construction kits” for PDEs and their boundary conditions. We started this about five years ago, progressively covering more and more application areas. In Version 14 we’ve particularly focused on solid mechanics, fluid mechanics, electromagnetics and (one-particle) quantum mechanics.

Here’s an example from solid mechanics. First, we define the variables we’re dealing with (displacement and underlying coordinates):

Next, we specify the parameters we want to use to describe the solid material we’re going to work with:

Now we can actually set up our PDE—using symbolic PDE specifications like SolidMechanicsPDEComponent—here for the deformation of a solid object pulled on one side:

And, yes, “underneath”, these simple symbolic specifications turn into a complicated “raw” PDE:

Now we are ready to actually solve our PDE in a particular region, i.e. for an object with a particular shape:

And now we can visualize the result, which shows how our object stretches when it’s pulled on:

The way we’ve set things up, the material for our object is an idealization of something like rubber. But in the Wolfram Language we now have ways to specify all sorts of detailed properties of materials. So, for example, we can add reinforcement as a unit vector in a particular direction (say in practice with fibers) to our material:

Then we can rerun what we did before

but now we get a slightly different result:

Another major PDE domain that’s new in Version 14.0 is fluid flow. Let’s do a 2D example. Our variables are 2D velocity and pressure:

Now we can set up our fluid system in a particular region, with no-slip conditions on all walls except at the top where we assume fluid is flowing from left to right. The only parameter needed is the Reynolds number. And instead of just solving our PDEs for a single Reynolds number, let’s create a parametric solver that can take any specified Reynolds number:

Now here’s the result for Reynolds number 100:

But with the way we’ve set things up, we can as well generate a whole video as a function of Reynolds number (and, yes, the Parallelize speeds things up by generating different frames in parallel):

Much of our work in PDEs involves catering to the complexities of real-world engineering situations. But in Version 14.0 we’re also adding features to support “pure physics”, and in particular to support quantum mechanics done with the Schrödinger equation. So here, for example, is the 2D 1-particle Schrödinger equation (with ):

Here’s the region we’re going to be solving over—showing explicit discretization:

Now we can solve the equation, adding in some boundary conditions:

And now we get to visualize a Gaussian wave packet scattering around a barrier:

Streamlining Systems Engineering Computation

Systems engineering is a big field, but it’s one where the structure and capabilities of the Wolfram Language provide unique advantages—that over the past decade have allowed us to build out rather complete industrial-strength support for modeling, analysis and control design for a wide range of types of systems. It’s all an integrated part of the Wolfram Language, accessible through the computational and interface structure of the language. But it’s also integrated with our separate Wolfram System Modeler product, that provides a GUI-based workflow for system modeling and exploration.

Shared with System Modeler are large collections of domain-specific modeling libraries. And, for example, since Version 13, we’ve added libraries in areas such as battery engineering, hydraulic engineering and aircraft engineering—as well as educational libraries for mechanical engineering, thermal engineering, digital electronics, and biology. (We’ve also added libraries for areas such as business and public policy simulation.)

A typical workflow for systems engineering begins with the setting up of a model. The model can be built from scratch, or assembled from components in model libraries—either visually in Wolfram System Modeler, or programmatically in the Wolfram Language. For example, here’s a model of an electric motor that’s turning a load through a flexible shaft:

Once one’s got a model, one can then simulate it. Here’s an example where we’ve set one parameter of our model (the moment of inertia of the load), and we’re computing the values of two others as a function of time:

A new capability in Version 14.0 is being able to see the effect of uncertainty in parameters (or initial values, etc.) on the behavior of a system. So here, as an example, we’re saying the value of the parameter is not definite, but is instead distributed according to a normal distribution—then we’re seeing the distribution of output results:

The motor with flexible shaft that we’re looking at can be thought of as a “multidomain system”, combining electrical and mechanical components. But the Wolfram Language (and Wolfram System Modeler) can also handle “mixed systems”, combining analog and digital (i.e. continuous and discrete) components. Here’s a fairly sophisticated example from the world of control systems: a helicopter model connected in a closed loop to a digital control system:

This whole model system can be represented symbolically just by:

And now we compute the input-output response of the model:

Here’s specifically the output response:

But now we can “drill in” and see specific subsystem responses, here of the zero-order hold device (labeled ZOH above)—complete with its little digital steps:

But what if we want to design the control systems ourselves? Well, in Version 14 we can now apply all our Wolfram Language control systems design functionality to arbitrary system models. Here’s an example of a simple model, in this case in chemical engineering (a continuously stirred tank):

Now we can take this model and design an LQG controller for it—then assemble a whole closed-loop system for it:

Now we can simulate the closed-loop system—and see that the controller succeeds in bringing the final value to 0:

Graphics: More Beautiful & Alive

Graphics have always been an important part of the story of the Wolfram Language, and for more than three decades we’ve been progressively enhancing and updating their appearance and functionality—sometimes with help from advances in hardware (e.g. GPU) capabilities.

Since Version 13 we’ve added a variety of “decorative” (or “annotative”) effects in 2D graphics. One example (useful for putting captions on things) is Haloing:

Another example is DropShadowing:

All of these are specified symbolically, and can be used throughout the system (e.g. in hover effects, etc). And, yes, there are many detailed parameters you can set:

A significant new capability in Version 14.0 is convenient texture mapping. We’ve had low-level polygon-by-polygon textures for a decade and a half. But now in Version 14.0 we’ve made it straightforward to map textures onto whole surfaces. Here’s an example wrapping a texture onto a sphere:

And here’s wrapping the same texture onto a more complicated surface:

A significant subtlety is that there are many ways to map what amount to “texture coordinate patches” onto surfaces. The documentation illustrates new, named cases:

And now here’s what happens with stereographic projection onto a sphere:

Here’s an example of “surface texture” for the planet Venus

and here it’s been mapped onto a sphere, which can be rotated:

Here’s a “flowerified” bunny:

Things like texture mapping help make graphics visually compelling. Since Version 13 we’ve also added a variety of “live visualization” capabilities that automatically “bring visualizations to life”. For example, any plot now by default has a “coordinate mouseover”:

As usual, there’s lots of ways to control such “highlighting” effects:

Euclid Redux: The Advance of Synthetic Geometry

One might say it’s been two thousand years in the making. But four years ago (Version 12) we began to introduce a computable version of Euclid-style synthetic geometry.

The idea is to specify geometric scenes symbolically by giving a collection of (potentially implicit) constraints:

We can then generate a random instance of geometry consistent with the constraints—and in Version 14 we’ve considerably enhanced our ability to make sure that geometry will be “typical” and non-degenerate:

But now a new feature of Version 14 is that we can find values of geometric quantities that are determined by the constraints:

Here’s a slightly more complicated case:

And here we’re now solving for the areas of two triangles in the figure:

We’ve always been able to give explicit styles for particular elements of a scene:

Now one of the new features in Version 14 is being able to give general “geometric styling rules”, here just assigning random colors to each element:

The Ever-Smoother User Interface

Our goal with Wolfram Language is to make it as easy as possible to express oneself computationally. And a big part of achieving that is the coherent design of the language itself. But there’s another part as well, which is being able to actually enter Wolfram Language input one wants—say in a notebook—as easily as possible. And with every new version we make enhancements to this.

One area that’s been in continuous development is interactive syntax highlighting. We first added syntax highlighting nearly two decades ago—and over time we’ve progressively made it more and more sophisticated, responding both as you type, and as code gets executed. Some highlighting has always had obvious meaning. But particularly highlighting that is dynamic and based on cursor position has sometimes been harder to interpret. And in Version 14—leveraging the brighter color palettes that have become the norm in recent years—we’ve tuned our dynamic highlighting so it’s easier to quickly tell “where you are” within the structure of an expression:

On the subject of “knowing what one has”, another enhancement—added in Version 13.2—is differentiated frame coloring for different kinds of visual objects in notebooks. Is that thing one has a graphic? Or an image? Or a graph? Now one can tell from the color of frame when one selects it:

An important aspect of the Wolfram Language is that the names of built-in functions are spelled out enough that it’s easy to tell what they do. But often the names are therefore necessarily quite long, and so it’s important to be able to autocomplete them when one’s typing. In 13.3 we added the notion of “fuzzy autocompletion” that not only “completes to the end” a name one’s typing, but also can fill in intermediate letters, change capitalization, etc. Thus, for example, just typing lll brings up an autocompletion menu that begins with ListLogLogPlot:

A major user interface update that first appeared in Version 13.1—and has been enhanced in subsequent versions—is a default toolbar for every notebook:

The toolbar provides immediate access to evaluation controls, cell formatting and various kinds of input (like inline cells, , hyperlinks, drawing canvas, etc.)—as well as to things like cloud publishing, documentation search and “chat” (i.e. LLM) settings.

Much of the time, it’s useful to have the toolbar displayed in any notebook you’re working with. But on the left-hand side there’s a little tiny that lets you minimize the toolbar:

In 14.0 there’s a Preferences setting that makes the toolbar come up minimized in any new notebook you create—and this in effect gives you the best of both worlds: you have immediate access to the toolbar, but your notebooks don’t have anything “extra” that might distract from their content.

Another thing that’s advanced since Version 13 is the handling of “summary” forms of output in notebooks. A basic example is what happens if you generate a very large result. By default only a summary of the result is actually displayed. But now there’s a bar at the bottom that gives various options for how to handle the actual output:

By default, the output is only stored in your current kernel session. But by pressing the Iconize button you get an iconized form that will appear directly in your notebook (or one that can be copied anywhere) and that “has the whole output inside”. There’s also a Store full expression in notebook button, which will “invisibly” store the output expression “behind” the summary display.

If the expression is stored in the notebook, then it’ll be persistent across kernel sessions. Otherwise, well, you won’t be able to get to it in a different kernel session; the only thing you’ll have is the summary display:

It’s a similar story for large “computational objects”. Like here’s a Nearest function with a million data points:

By default, the data is just something that exists in your current kernel session. But now there’s a menu that lets you save the data in various persistent locations:

And There’s the Cloud Too

There are many ways to run the Wolfram Language. Even in Version 1.0 we had the notion of remote kernels: the notebook front end running on one machine (in those days essentially always a Mac, or a NeXT), and the kernel running on a different machine (in those days sometimes even connected by phone lines). But a decade ago came a major step forward: the Wolfram Cloud.

There are really two distinct ways in which the cloud is used. The first is in delivering a notebook experience similar to our longtime desktop experience, but running purely in a browser. And the second is in delivering APIs and other programmatically accessed capabilities—notably, even at the beginning, a decade ago, through things like APIFunction.

The Wolfram Cloud has been the target of intense development now for nearly 15 years. Alongside it have also come Wolfram Application Server and Wolfram Web Engine, which provide more streamlined support specifically for APIs (without things like user management, etc., but with things like clustering).

All of these—but particularly the Wolfram Cloud—have become core technology capabilities for us, supporting many of our other activities. So, for example, the Wolfram Function Repository and Wolfram Paclet Repository are both based on the Wolfram Cloud (and in fact this is true of our whole resource system). And when we came to build the Wolfram plugin for ChatGPT earlier this year, using the Wolfram Cloud allowed us to have the plugin deployed within a matter of days.

Since Version 13 there have been quite a few very different applications of the Wolfram Cloud. One is for the function ARPublish, which takes 3D geometry and puts it in the Wolfram Cloud with appropriate metadata to allow phones to get augmented-reality versions from a QR code of a cloud URL:

On the Cloud Notebook side, there’s been a steady increase in usage, notably of embedded Cloud Notebooks, which have for example become common on Wolfram Community, and are used all over the Wolfram Demonstrations Project. Our goal all along has been to make Cloud Notebooks be as easy to use as simple webpages, but to have the depth of capabilities that we’ve developed in notebooks over the past 35 years. We achieved this some years ago for fairly small notebooks, but in the past couple of years we’ve been going progressively further in handling even multi-hundred-megabyte notebooks. It’s a complicated story of caching, refreshing—and dodging the vicissitudes of web browsers. But at this point the vast majority of notebooks can be seamlessly deployed to the cloud, and will display as immediately as simple webpages.

The Great Integration Story for External Code

It’s been possible to call external code from Wolfram Language ever since Version 1.0. But in Version 14 there are important advances in the extent and ease with which external code can be integrated. The overall goal is to be able to use all the power and coherence of the Wolfram Language even when some part of a computation is done in external code. And in Version 14 we’ve done a lot to streamline and automate the process by which external code can be integrated into the language.

Once something is integrated into the Wolfram Language it just becomes, for example, a function that can be used just like any other Wolfram Language function. But what’s underneath is necessarily quite different for different kinds of external code. There’s one setup for interpreted languages like Python. There’s another for C-like compiled languages and dynamic libraries. (And then there are others for external processes, APIs, and what amount to “importable code specifications”, say for neural networks.)

Let’s start with Python. We’ve had ExternalEvaluate for evaluating Python code since 2018. But when you actually come to use Python there are all these dependencies and libraries to deal with. And, yes, that’s one of the places where the incredible advantages of the Wolfram Language and its coherent design are painfully evident. But in Version 14.0 we now have a way to encapsulate all that Python complexity, so that we can deliver Python functionality within Wolfram Language, hiding all the messiness of Python dependencies, and even the versioning of Python itself.

As an example, let’s say we want to make a Wolfram Language function Emojize that uses the Python function emojize within the emoji Python library. Here’s how we can do that:

And now you can just call Emojize in the Wolfram Language and—under the hood—it’ll run Python code:

The way this works is that the first time you call Emojize, a Python environment with all the right features is created, then is cached for subsequent uses. And what’s important is that the Wolfram Language specification of Emojize is completely system independent (or as system independent as it can be, given vicissitudes of Python implementations). So that means that you can, for example, deploy Emojize in the Wolfram Function Repository just like you would deploy something written purely in Wolfram Language.

There’s very different engineering involved in calling C-compatible functions in dynamic libraries. But in Version 13.3 we also made this very streamlined using the function ForeignFunctionLoad. There’s all sorts of complexity associated with converting to and from native C data types, managing memory for data structures, etc. But we’ve now got very clean ways to do this in Wolfram Language.

As an example, here’s how one sets up a “foreign function” call to a function RAND_bytes in the OpenSSL library:

Inside this, we’re using Wolfram Language compiler technology to specify the native C types that will be used in the foreign function. But now we can package this all up into a Wolfram Language function:

And we can call this function just like any other Wolfram Language function:

Internally, all sorts of complicated things are going on. For example, we’re allocating a raw memory buffer that’s then getting fed to our C function. But when we do that memory allocation we’re creating a symbolic structure that defines it as a “managed object”:

And now when this object is no longer being used, the memory associated with it will be automatically freed.

And, yes, with both Python and C there’s quite a bit of complexity underneath. But the good news is that in Version 14 we’ve basically been able to automate handling it. And the result is that what gets exposed is pure, simple Wolfram Language.

But there’s another big piece to this. Within particular Python or C libraries there are often elaborate definitions of data structures that are specific to that library. And so to use these libraries one has to dive into all the—potentially idiosyncratic—complexities of those definitions. But in the Wolfram Language we have consistent symbolic representations for things, whether they’re images, or dates or types of chemicals. When you first hook up an external library you have to map its data structures to these. But once that’s done, anyone can use what’s been built, and seamlessly integrate with other things they’re doing, perhaps even calling other external code. In effect what’s happening is that one’s leveraging the whole design framework of the Wolfram Language, and applying that even when one’s using underlying implementations that aren’t based on the Wolfram Language.

For Serious Developers

A single line (or less) of Wolfram Language code can do a lot. But one of the remarkable things about the language is that it’s fundamentally scalable: good both for very short programs and very long programs. And since Version 13 there’ve been several advances in handling very long programs. One of them concerns “code editing”.

Standard Wolfram Notebooks work very well for exploratory, expository and many other forms of work. And it’s certainly possible to write large amounts of code in standard notebooks (and, for example, I personally do it). But when one’s doing “software-engineering-style work” it’s both more convenient and more familiar to use what amounts to a pure code editor, largely separate from code execution and exposition. And this is why we have the “package editor”, accessible from File > New > Package/Script. You’re still operating in the notebook environment, with all its sophisticated capabilities. But things have been “skinned” to provide a much more textual “code experience”—both in terms of editing, and in terms of what actually gets saved in .wl files.

Here’s typical example of the package editor in action (in this case applied to our GitLink package):

Several things are immediately evident. First, it’s very line oriented. Lines (of code) are numbered, and don’t break except at explicit newlines. There are headings just like in ordinary notebooks, but when the file is saved, they’re stored as comments with a certain stylized structure:

It’s still perfectly possible to run code in the package editor, but the output won’t get saved in the .wl file:

One thing that’s changed since Version 13 is that the toolbar is much enhanced. And for example there’s now “smart search” that is aware of code structure:

You can also ask to go to a line number—and you’ll immediately see whatever lines of code are nearby:

In addition to code editing, another set of features new since Version 13 of importance to serious developers concern automated testing. The main advance is the introduction of a fully symbolic testing framework, in which individual tests are represented as symbolic objects

and can be manipulated in symbolic form, then run using functions like TestEvaluate and TestReport:

In Version 14.0 there’s another new testing function—IntermediateTest—that lets you insert what amount to checkpoints inside larger tests:

Evaluating this test, we see that the intermediate tests were also run:

Wolfram Function Repository: 2900 Functions & Counting

The Wolfram Function Repository has been a big success. We introduced it in 2019 as a way to make specific, individual contributed functions available in the Wolfram Language. And now there are more than 2900 such functions in the Repository.

The nearly 7000 functions that constitute the Wolfram Language as it is today have been painstakingly developed over the past three and a half decades, always mindful of creating a coherent whole with consistent design principles. And now in a sense the success of the Function Repository is one of the dividends of all that effort. Because it’s the coherence and consistency of the underlying language and its design principles that make it feasible to just add one function at a time, and have it really work. You want to add a function to do some very specific operation that combines images and graphs. Well, there’s a consistent representation of both images and graphs in the Wolfram Language, which you can leverage. And by following the principles of the Wolfram Language—like for the naming of functions—you can create a function that’ll be easy for Wolfram Language users to understand and use.

Using the Wolfram Function Repository is a remarkably seamless process. If you know the function’s name, you can just call it using ResourceFunction; the function will be loaded if it’s needed, and then it’ll just run:

If there’s an update available for the function, it’ll give you a message, but run the old version anyway. The message has a button that lets you load in the update; then you can rerun your input and use the new version. (If you’re writing code where you want to “burn in” a particular version of a function, you can just use the ResourceVersion option of ResourceFunction.)

If you want your code to look more elegant, just evaluate the ResourceFunction object

and use the formatted version:

And, by the way, pressing the + then gives you more information about the function:

An important feature of functions in the Function Repository is that they all have documentation pages—that are organized pretty much like the pages for built-in functions:

But how does one create a Function Repository entry? Just go to File > New > Repository Item > Function Repository Item and you’ll get a Definition Notebook:

We’ve optimized this to be as easy to fill in as possible, minimizing boilerplate and automatically checking for correctness and consistency whenever possible. And the result is that it’s perfectly realistic to create a simple Function Repository item in under an hour—with the main time spent being in the writing of good expository examples.

When you press Submit to Repository your function gets sent to the Wolfram Function Repository review team, whose mandate is to ensure that functions in the repository do what they say they do, work in a way that is consistent with general Wolfram Language design principles, have good names, and are adequately documented. Except for very specialized functions, the goal is to finish reviews within a week (and sometimes considerably sooner)—and to publish functions as soon as they are ready.

There’s a digest of new (and updated) functions in the Function Repository that gets sent out every Friday—and makes for interesting reading (you can subscribe here):

The Wolfram Function Repository is a curated public resource that can be accessed from any Wolfram Language system (and, by the way, the source code for every function is available—just press the Source Notebook button). But there’s another important use case for the infrastructure of the Function Repository: privately deployed “resource functions”.

It all works through the Wolfram Cloud. You use the exact same Definition Notebook, but now instead of submitting to the public Wolfram Function Repository, you just deploy your function to the Wolfram Cloud. You can make it private so that only you, or some specific group, can access it. Or you can make it public, so anyone who knows its URL can immediately access and use it in their Wolfram Language system.

This turns out to be a tremendously useful mechanism, both for group projects, and for creating published material. In a sense it’s a very lightweight but robust way to distribute code—packaged into functions that can immediately be used. (By the way, to find the functions you’ve published from your Wolfram Cloud account, just go to the DeployedResources folder in the cloud file browser.)

(For organizations that want to manage their own function repository, it’s worth mentioning that the whole Wolfram Function Repository mechanism—including the infrastructure for doing reviews, etc.—is also available in a private form through the Wolfram Enterprise Private Cloud.)

So what’s in the public Wolfram Function Repository? There are a lot of “specialty functions” intended for specific “niche” purposes—but very useful if they’re what you want:

There are functions that add various kinds of visualizations:

Some functions set up user interfaces:

Some functions link to external services:

Some functions provide simple utilities:

There are also functions that are being explored for potential inclusion in the core system:

There are also lots of “leading-edge” functions, added as part of research or exploratory development. And for example in pieces I write (including this one), I make a point of having all pictures and other output be backed by “click-to-copy” code that reproduces them—and this code quite often contains functions either from the public Wolfram Function Repository or from (publicly accessible) private deployments.

The Paclet Repository Arrives

Paclets are a technology we’ve used for more than a decade and a half to distribute updated functionality to Wolfram Language systems in the field. In Version 13 we began the process of providing tools for anyone to create paclets. And since Version 13 we’ve introduced the Wolfram Language Paclet Repository as a centralized repository for paclets:

What is a paclet? It’s a collection of Wolfram Language functionality—including function definitions, documentation, external libraries, stylesheets, palettes and more—that can be distributed as a unit, and immediately deployed in any Wolfram Language system.

The Paclet Repository is a centralized place where anyone can publish paclets for public distribution. So how does this relate to the Wolfram Function Repository? They are interestingly complementary—with different optimization and different setups. The Function Repository is more lightweight, the Paclet Repository more flexible. The Function Repository is for making available individual new functions, that independently fit into the whole existing structure of the Wolfram Language. The Paclet Repository is for making available larger-scale pieces of functionality, that can define a whole framework and environment of their own.

The Function Repository is also fully curated, with every function being reviewed by our team before it is posted. The Paclet Repository is an immediate-deployment system, without pre-publication review. In the Function Repository every function is specified just by its name—and our review team is responsible for ensuring that names are well chosen and have no conflicts. In the Paclet Repository, every contributor gets their own namespace, and all their functions and other material live inside that namespace. So, for example, I contributed the function RandomHypergraph to the Function Repository, which can be accessed just as ResourceFunction["RandomHypergraph"]. But if I had put this function in a paclet in the Paclet Repository, it would have to be accessed as something like PacletSymbol["StephenWolfram/Hypergraphs", "RandomHypergraph"].

PacletSymbol, by the way, is a convenient way of “deep accessing” individual functions inside a paclet. PacletSymbol temporarily installs (and loads) a paclet so that you can access a particular symbol in it. But more often one wants to permanently install a paclet (using PacletInstall), then explicitly load its contents (using Needs) whenever one wants to have its symbols available. (All the various ancillary elements, like documentation, stylesheets, etc. in a paclet get set up when it is installed.)

What does a paclet look like in the Paclet Repository? Every paclet has a home page that typically includes an overall summary, a guide to the functions in the paclet, and some overall examples of the paclet:

Individual functions typically have their own documentation pages:

Just like in the main Wolfram Language documentation, there can be a whole hierarchy of guide pages, and there can be things like tutorials.

Notice that in examples in paclet documentation, one often sees constructs like . These represent symbols in the paclet, presented in forms like PacletSymbol["WolframChemistry/ProteinVisualization", "AmidePlanePlot"] that allow these symbols to be accessed in a “standalone” way. If you directly evaluate such a form, by the way, it’ll force (temporary) installation of the paclet, then return the actual, raw symbol that appears in the paclet:

So how does one create a paclet suitable for submission to the Paclet Repository? You can do it purely programmatically, or you can start from File > New > Repository Item > Paclet Repository Item, which launches what amounts to a whole paclet creation IDE. The first step is to specify where you want to assemble your paclet. You give some basic information

then a Paclet Resource Definition Notebook is created, from which you can give function definitions, set up documentation pages, specify what you want your paclet’s home page to be like, etc.:

There are lots of sophisticated tools that let you create full-featured paclets with the same kind of breadth and depth of capabilities that you find in the Wolfram Language itself. For example, Documentation Tools lets you construct full-featured documentation pages (function pages, guide pages, tutorials, …):

Once you’ve assembled a paclet, you can check it, build it, deploy it privately—or submit it to the Paclet Repository. And once you submit it, it will automatically get set up on the Paclet Repository servers, and within just a few minutes the pages you’ve created describing your paclet will show up on the Paclet Repository website.

So what’s in the Paclet Repository so far? There’s a lot of good and very serious stuff, contributed both by teams at our company and by members of the broader Wolfram Language community. In fact, many of the 134 paclets now in the Paclet Repository have enough in them that there’s a whole piece like this that one could write about them.

One category of things you’ll find in the Paclet Repository are snapshots of our ongoing internal development projects—many of which will eventually become built-in parts of the Wolfram Language. A good example of this is our LLM and Chat Notebook functionality, whose rapid development and deployment over the past year was made possible by the use of the Paclet Repository. Another example, representing ongoing work from our chemistry team (AKA WolframChemistry in the Paclet Repository) is the ChemistryFunctions paclet, which contains functions like:

And, yes, this is interactive:

Or, also from WolframChemistry:

Another “development snapshot” is DiffTools—a paclet for making and viewing diffs between strings, cells, notebooks, etc.:

A major paclet is QuantumFramework—which provides the functionality for our Wolfram Quantum Framework

and delivers broad support for quantum computing (with at least a few connections to multiway systems and our Physics Project):

Talking of our Physics Project, there are over 200 functions supporting it that are in the Wolfram Function Repository. But there are also paclets, like WolframInstitute/Hypergraph:

An example of an externally contributed package is Automata—with more than 250 functions for doing computations related to finite automata:

Another contributed paclet is FunctionalParsers, which goes from a symbolic parser specification to an actual parser, here being used in a reverse mode to generate random “sentences”:

Phi4Tools is a more specialized paclet, for working with Feynman diagrams in field theory:

And, as another example, here’s MaXrd, for crystallography and x-ray scattering:

As just one more example, there’s the Organizer paclet—a utility paclet for making and manipulating organizer notebooks. But unlike the other paclets we’ve seen here, it doesn’t expose any Wolfram Language functions; instead, when you install it, it puts a palette in your Palettes list:

Coming Attractions

As of today, Version 14 is finished, and out in the world. So what’s next? We have lots of projects underway—some already with years of development behind them. Some extend and strengthen what’s already in the Wolfram Language; some take it in new directions.

One major focus is broadening and streamlining the deployment of the language: unifying the way it’s delivered and installed on computers, packaging it so it can be efficiently integrated into other standalone applications, etc.

Another major focus is expanding the handling of very large amounts of data by the Wolfram Language—and seamlessly integrating out-of-core and lazy processing.

Then of course there’s algorithmic development. Some is “classical”, directly building on the towers of functionality we’ve developed over the decades. Some is more “AI based”. We’ve been creating heuristic algorithms and meta-algorithms ever since Version 1.0—increasingly using methods from machine learning. How far will neural net methods go? We don’t know yet. We’re routinely using them in things like algorithm selection. But to what extent can they help in the heart of algorithms?

I’m reminded of something we did back in 1987 in developing Version 1.0. There was a long tradition in numerical analysis of painstakingly deriving series approximations for particular cases of mathematical functions. But we wanted to be able to compute hundreds of different functions to arbitrary precision for any complex values of their arguments. So how did we do it? We generalized from series to rational approximations—and then, in a very “machine-learning-esque” way—we spent months of CPU time systematically optimizing these approximations. Well, we’ve been trying to do the same kind of thing again—though now over more ambitious domains—and now using not rational functions but large neural nets as our basis.

We’ve also been exploring using neural nets to “control” precise algorithms, in effect making heuristic choices which either guide or can be validated by the precise algorithms. So far, none of what we’ve produced has outperformed our existing methods, but it seems plausible that fairly soon it will.

We’re doing a lot with various aspects of metaprogramming. There’s the project of
getting LLMs to help in the construction of Wolfram Language code—and in giving comments on it, and in analyzing what went wrong if the code didn’t do what one expected. Then there’s code annotation—where LLMs may help in doing things like predicting the most likely type for something. And there’s code compilation. We’ve been working for many years on a full-scale compiler for the Wolfram Language, and in every version what we have becomes progressively more capable. We’ve been doing some level of automatic compilation in particular cases (particularly ones involving numerical computation) for more than 30 years. And eventually full-scale automatic compilation will be possible for everything. But as of now some of the biggest payoffs from our compiler technology have been for our internal development, where we can now get optimal down-to-the-metal performance simply by compiled (albeit carefully written) Wolfram Language code.

One of the big lessons of the surprising success of LLMs is that there’s potentially more structure in meaningful human language than we thought. I’ve long been interested in creating what I’ve called a “symbolic discourse language” that gives a computational representation of everyday discourse. The LLMs haven’t explicitly done that. But they encourage the idea that it should be possible, and they also provide practical help in doing it. And whether the goal is to be able to represent narrative text, or contracts, or textual specifications, it’s a matter of extending the computational language we’ve built to encompass more kinds of concepts and structures.

There are typically several kinds of drivers for our continued development efforts. Sometimes it’s a question of continuing to build a tower of capabilities in some known direction (like, for example, solving PDEs). Sometimes the tower we’ve built suddenly lets us see new possibilities. Sometimes when we actually use what we’ve built we realize there’s an obvious way to polish or extend it—or to “double down” on something that we can now see is valuable. And then there are cases where things happening in the technology world suddenly open up new possibilities—like LLMs have recently done, and perhaps XR will eventually do. And finally there are cases where new science-related insights suggest new directions.

I had assumed that our Physics Project would at best have practical applications only centuries hence. But in fact it’s become clear that the correspondence it’s defined between physics and computation gives us quite immediate new ways to think about aspects of practical computation. And indeed we’re now actively exploring how to use this to define a new level of parallel and distributed computation in the Wolfram Language, as well as to represent symbolically not only the results of computations but also the ongoing process of computation.

One might think that after nearly four decades of intense development there wouldn’t be anything left to do in developing the Wolfram Language. But in fact at every level we reach, there’s ever more that becomes possible, and ever more that can we see might be possible. And indeed this moment is a particularly fertile one, with an unprecedentedly broad waterfront of possibilities. Version 14 is an important and satisfying waypoint. But there are wonderful things ahead—as we continue our long-term mission to make the computational paradigm achieve its potential, and to build our computational language to help that happen.

Download your 14 now! » (It’s already live in the Wolfram Cloud!)

Observer Theory

Stephen Wolfram — Mon, 11 Dec 2023 20:44:16 +0000

The Concept of the Observer

We call it perception. We call it measurement. We call it analysis. But in the end it’s about how we take the world as it is, and derive from it the impression of it that we have in our minds.

We might have thought that we could do science “purely objectively” without any reference to observers or their nature. But what we’ve discovered particularly dramatically in our Physics Project is that the nature of us as observers is critical even in determining the most fundamental laws we attribute to the universe.

But what ultimately does an observer—say like us—do? And how can we make a theoretical framework for it? Much as we have a general model for the process of computation—instantiated by something like a Turing machine—we’d like to have a general model for the process of observation: a general “observer theory”.

Central to what we think of as an observer is the notion that the observer will take the raw complexity of the world and extract from it some reduced representation suitable for a finite mind. There might be zillions of photons impinging on our eyes, but all we extract is the arrangement of objects in a visual scene. Or there might be zillions of gas molecules impinging on a piston, yet all we extract is the overall pressure of the gas.

In the end, we can think of it fundamentally as being about equivalencing. There are immense numbers of different individual configurations for the photons or the gas molecules—that are all treated as equivalent by an observer who’s just picking out the particular features needed for some reduced representation.

There’s in a sense a certain duality between computation and observation. In computation one’s generating new states of a system. In observation, one’s equivalencing together different states.

That equivalencing must in the end be implemented “underneath” by computation. But in observer theory what we want to do is just characterize the equivalencing that’s achieved. For us as observers it might in practice be all about how our senses work, what our biological or cultural nature is—or what technological devices or structures we’ve built. But what makes a coherent concept of observer theory possible is that there seem to be general, abstract characterizations that capture the essence of different kinds of observers.

It’s not immediately obvious that anything suitable for a finite mind could ever be extracted from the complexity of the world. And indeed the Principle of Computational Equivalence implies that computational irreducibility (and its multicomputational generalization) will be ubiquitous. But within computational irreducibility there must always be slices of computational reducibility. And it’s these slices of reducibility that an observer must try to pick out—and that ultimately make it possible for a finite mind to develop a “useful narrative” about what happens in the world, that allows it to make decisions, predictions, and so on.

How “special” is what an observer does? At its core it’s just about taking a large set of possible inputs, and returning a much smaller set of possible outputs. And certainly that’s a conceptual idea that’s appeared in many fields under many different names: a contractive mapping, reduction to canonical form, a classifier, an acceptor, a forgetful functor, evolving to an attractor, extracting statistics, model fitting, lossy compression, projection, phase transitions, renormalization group transformations, coarse graining and so on. But here we want to think not about what’s “mathematically describable”, but instead about what in general is actually implemented—say by our senses, our measuring devices, or our ways of analyzing things.

At an ultimate level, everything that happens can be thought of as being captured by the ruliad—the unique object that emerges as the entangled limit of all possible computations. And in a vast generalization of ideas like that our brains—like any other material thing—are made of atoms, so too any observer must be embedded as some kind of structure within the ruliad. But a key concept of observer theory is that it’s possible to make conclusions about an observer’s impression of the world just by knowing about the capabilities—and assumptions—of the observer, without knowing in detail what the observer is “like inside”.

And so it is, for example, that in our Physics Project we seem to be able to derive—essentially from the structure of the ruliad—the core laws of twentieth-century physics (general relativity, quantum mechanics and the Second Law) just on the basis of two features of us as observers: that we’re computationally bounded, and that we believe we’re persistent in time (even though “underneath” we’re made of different atoms of space at every successive moment). And we can expect that if we were to include other features of us as observers (for example, that we believe there are persistent objects in the world, or that we believe we have free will) then we’d be able to derive more aspects of the universe as we experience it—or of natural laws we attribute to it.

But the notion of observers—and observer theory—isn’t limited purely to “physical observers”. It applies whenever we try to “get an impression” of something. And so, for example, we can also operate as “mathematical observers”, sampling the ruliad to build up conclusions about mathematical laws. Some features of us as physical observers—like the computational boundedness associated with the finiteness of our minds—inevitably carry over to us as mathematical observers. But other features do not. But the point of observer theory is to provide a general framework in which we can characterize observers—and then see the consequences of those characterizations for the impressions or conclusions observers will form.

The Operation of Observers

As humans we have senses like sight, hearing, touch, taste, smell and balance. And through our technology we also have access to a few thousand other kinds of measurements. So how basically do all these work?

The vast majority in effect aggregate a large number of small inputs to generate some kind of “average” output—which in the case of measurements is often specified as a (real) number. In a few cases, however, there’s instead a discrete choice between outputs that’s made on the basis of whether the total input exceeds a threshold (think: distributed consensus schemes, weighing balances, etc.)

But in all cases what’s fundamentally happening is that lots of different input configurations are all being equivalenced—or, more operationally, the dynamics of the system essentially make all equivalenced states evolve to the same “attractor state”.

As an example, let’s consider measuring the pressure of a gas. There are various ways to do this. But a very direct one is just to have a piston, and see how much force is exerted by the gas on this piston. So where does this force come from? At the lowest level it’s the result of lots of individual molecules bouncing off the surface of the piston, each transferring a tiny amount of momentum to it. If we looked at the piston at an atomic scale, we’d see it temporarily deform from each molecular impact. But the crucial point is that at a large scale the piston moves together, as a single rigid object—aggregating the effects of all those individual molecular impacts.

But why does it work this way? Essentially it’s because the intermolecular forces inside the piston are much stronger than the forces associated with molecules in the gas. Or, put more abstractly, there’s more coupling and coherence “inside the observer” than between the observer and what it’s observing.

We see the same basic pattern over and over again. There’s some form of transduction that couples the individual elements of what’s being observed to the observer. Then “within the observer” there’s something that in essence aggregates all these small effects. Sometimes that aggregation is “directly numerical”, as in the addition of lots of small momentum transfers. But sometimes it’s instead more explicitly like evolution to one attractor rather than another.

Consider, for example, the case of vision. An array of photons fall on the photoreceptor cells on our retinas, generating electrical signals transmitted through nerve fibers to our brains. Within the brain there’s then effectively a neural net that evolves to different attractors depending on what one’s looking at. Most of the time a small change in input image won’t affect what attractor one evolves to. But—much like with a weighing balance—there’s an “edge” at which even a small change can lead to a different output.

One can go through lots of different types of sensory systems and measuring devices. But the basic outline seems to always be the same. First, there’s a coupling between what is being sensed or measured and the thing that’s doing the sensing or measuring. Quite often that coupling involves transducing from one physical form to another—say from light to electricity, or from force to position. Sometimes then the crucial step of equivalencing different detailed inputs is achieved by simple “numerical aggregation”, most often by accumulation of objects (atoms, raindrops, etc.) or physical effects (forces, currents, etc.). But sometimes the equivalencing is instead achieved by a more obviously dynamical process.

It could amount to simple amplification, in which, say, the presence of a small element of input (say an individual particle) “tips over” some metastable system so that it goes into a certain final state. Or it could be more like a neural net where there’s a more complicated translation defined by hard-to-describe borders between basins of attraction leading to different attractors.

But, OK, so what’s the endpoint of a process of observation? Ultimately for us humans it’s an impression created in our minds. Of course that gets into lots of slippery philosophical issues. Yes, each of us has an “inner experience” of what’s going on in our mind. But anything else is ultimately an extrapolation. We make the assumption that other human minds also “see what we see”, but we can never “feel it from the inside”.

We can of course make increasingly detailed measurements—say of neural activity—to see how similar what’s going on is between one brain and another. But as soon as there’s the slightest structural—or situational—difference between the brains, we really can’t say exactly how their “impressions” will compare.

But for our purposes in constructing a general “observer theory” we’re basically going to make the assumption (or, in effect, “philosophical approximation”) that whenever a system does enough equivalencing, that’s tantamount to it “acting like an observer”, because it can then act as a “front end” that takes the “incoherent complexity of the world” and “collimates it” to the point where a mind will derive a definite impression from it.

Of course, there’s still a lot of subtlety here. There has to be “just enough equivalencing” and not too much. For example, if all inputs were always equivalenced to the same output, there’d be nothing useful observed. And in the end there’s somehow got to be some kind of match between the compression of input achieved by equivalencing, and the “capacity” of the mind that’s ultimately deriving an impression from it.

A crucial feature of anything that can reasonably be called a mind is that “something’s got to be going on in there”. It can’t be, for example, that the internal state of the system is fixed. There has to be some internal dynamics—some computational process that we can identify as the ongoing operation of the mind.

At an informational level we might say that there has to be more information processing going on inside than there is flow of information from the outside. Or, in other words, if we’re going to be meaningful “observers like us” we can’t just be bombarded by input we don’t process; we have to have some capability to “think about what we’re seeing”.

All of this comes back to the idea that a crucial feature of us as observers is that we are computationally bounded. We do computation; that’s why we can have an “inner sense of things going on”. But the amount of computation we do is tiny compared to the computation going on in the world around us. Our experience represents a heavily filtered version of “what’s happening outside”. And the essence of “being an observer like us” is that we’re effectively doing lots of equivalencing to get to that filtered version.

But can we imagine a future in which we “expand our minds”? Or perhaps encounter some alien intelligence with a fundamentally “less constrained mind”? Well, at some point there’s an issue with this. Because in a sense the idea that we have a coherent existence relies on us having “limited minds”. For without such constraints there wouldn’t be a coherent “self” that we could identify—with coherent inner experience.

Let’s say we’re shown some system—say in nature—“from the outside”. Can we tell if “there’s an observer in there”? Ultimately not, because in a sense we’d have to be “inside that observer” and be able to experience the impression of the world that it’s getting. But in much the same way as we extrapolate to believing that, say, other human minds are experiencing things like we’re experiencing, so also we can potentially extrapolate to say what we might think of as an observer.

And the core idea seems to be that an “observer” should be a subsystem whose “internal states” are affected by the rest of the system, but where many “external states” lead to the same internal state—and where there is rich dynamics “within the observer” that in effect operates only on its internal states. Ultimately—following the Principle of Computational Equivalence—both the outside and the inside of the “observer subsystem” can be expected to be equivalent in the computations they’re performing. But the point is that the coupling from outside the subsystem to inside effectively “coarse grains” what’s outside, so that the “inner computation” is operating on a much-reduced set of elements.

Why should any such “observer subsystems” exist? Presumably at some level it’s inevitable from the presence of pockets of computational reducibility within arbitrary computationally irreducible systems. But more important for us is that our very existence—and the possibility of our coherent inner experience—depends on us “operating as observers”. And—almost as a “self-fulfilling prophecy”—our behavior tends to perpetuate our ability to successfully do this. For example, we can think of us as choosing to put ourselves in situations and environments where we can “predict what’s going to happen” well enough to “survive as observers”. (At a mundane practical level we might do this by not living in places subject to unpredictable natural forces—or by doing things like building ourselves structures that shelter us from those forces.)

We’ve talked about observers operating by compressing the complexities of the world to “inner impressions” suitable for finite minds. And in typical situations that we describe as perception and measurement, the main way this happens is by fairly direct equivalencing of different states. But in a sense there’s a higher-level story that relies on formalization—and in essence computation—and that’s what we usually call “analysis”.

Let’s say we have some intricate structure—perhaps some nested, fractal pattern. A direct rendering of all the pixels in this pattern ultimately won’t be something well suited for a “finite mind”. But if we gave rules—or a program—for generating the pattern we’d have a much more succinct representation of it.

But now there’s a problem with computational irreducibility. Yes, the rules determine the pattern. But to get from these rules to the actual pattern can require an irreducible amount of computation. And to “reverse engineer the pattern” to find the rules can require even more computation.

Yes, there are particular cases—like repetitive and simple nested patterns—where there’s enough immediate computational reducibility that a computationally bounded system (or observer) can fairly easily “do the analysis” and “get the compression”. But in general it’s hard. And indeed in a sense it’s the whole mission of science to pick away at the problem, and try to find more ways to “reduce the complexities of the world” to “human-level narratives”.

Computational irreducibility limits the extent to which this can be successful. But the inevitable existence of pockets of reducibility even within computational irreducibility guarantees that progress can always in principle be made. As we invent more kinds of measuring devices we can extend our domain as observers. And the same is true when we invent more methods of analysis, or identify more principles in science.

But the overall picture remains the same: what’s crucial to “being an observer” is equivalencing many “states of the world”, either through perceiving or measuring only specific aspects of them, or through identifying “simplified narratives” that capture them. (In effect, perception and measurement tend to do “lossy compression”; analysis is more about “lossless compression” where the equivalencing is effectively not between possible inputs but between possible generative rules.)

How Observers Construct Their Perceived Reality

Our view of the world is ultimately determined by what we observe of it. We take what’s “out there in the world” and in effect “construct our perceived reality” by our operation as observers. Or, in other words, insofar as we have a narrative about “what’s going on in the world”, that’s something that comes from our operation as observers.

And in fact from our Physics Project we’re led to an extreme version of this—in which what’s “out there in the world” is just the whole ruliad, and in effect everything specific about our perceived reality must come from how we operate as observers and thus how we sample the ruliad.

But long before we get to this ultimate level of abstraction, there are lots of ways in which our nature as observers “builds” our perceived reality. Think about any material substance—like a fluid. Ultimately it’s made up of lots of individual molecules “doing their thing”. But observers like us aren’t seeing those molecules. Instead, we’re aggregating things to the point where we can just describe the system as a fluid, that operates according to the “narrative” defined by the laws of fluid mechanics.

But why do things work this way? Ultimately it’s the result of the repeated story of the interplay between underlying computational irreducibility, and the computational boundedness of us as observers. At the lowest level the motion of the molecules is governed by simple rules of mechanics. But the phenomenon of computational irreducibility implies that to work out the detailed consequences of “running these rules” involves an irreducible amount of computational work—which is something that we as computationally bounded observers can’t do. And the result of this is that we’ll end up describing the detailed behavior of the molecules as just “random”. As I’ve discussed at length elsewhere, this is the fundamental origin of the Second Law of thermodynamics. But for our purposes here the important point is that it’s what makes observers like us “construct the reality” of things like fluids. Our computational boundedness as observers makes us unable to trace all the detailed behavior of molecules, and leaves us “content” to describe fluids in terms of the “narrative” defined by the laws of fluid mechanics.

Our Physics Project implies that it’s the same kind of story with physical space. For in our Physics Project, space is ultimately “made” of a network of relations (or connections) between discrete “atoms of space”—that’s progressively being updated in what ends up being a computationally irreducible way. But we as computationally bounded observers can’t “decode” all the details of what’s happening, and instead we end up with a simple “aggregate” narrative, that turns out to correspond to continuum space operating according to the laws of general relativity.

The way both coherent notions of “matter” (or fluids) and spacetime emerge for us as observers can be thought of as a consequence of the equivalencing we do as observers. In both cases, there’s immense and computationally irreducible complexity “underneath”. But we’re ignoring most of that—by effectively treating different detailed behaviors as equivalent—so that in the end we get to a (comparatively) “simple narrative” more suitable for our finite minds. But we should emphasize that what’s “really going on in the system” is something much more complicated; it’s just that we as observers aren’t paying attention to that, so our perceived reality is much simpler.

OK, but what about quantum mechanics? In a sense that’s an extreme test of our description of how observers work, and the extent to which the operation of observers “constructs their perceived reality”.

The Case of Quantum Mechanics

In our Physics Project the underlying structure (hypergraph) that represents space and everything in it is progressively being rewritten according to definite rules. But the crucial point is that at any given stage there can be lots of ways this rewriting can happen. And the result is that there’s a whole tree of possible “states of the universe” that can be generated. So given this, why do we ever think that definite things happen in the universe? Why don’t we just think that there’s an infinite tree of branching histories for the universe?

Well, it all has to do with our nature as observers, and the equivalencing we do. At an immediate level, we can imagine looking at all those different possible branching paths for the evolution of the universe. And the key point is that even though they come from different paths of history, two states can just be the same. Sometimes it’ll be obvious that they’re same; sometimes one might have to determine, say, whether two hypergraphs are isomorphic. But the point is that to any observer (at least one that isn’t managing to look at arbitrary “implementation details”), the states will inevitably be considered equivalent.

But now there’s a bigger point. Even though “from the outside” there might be a whole branching and merging multiway graph of histories for the universe, observers like us can’t trace that. And in fact all we perceive is a single thread of history. Or, said another way, we believe that we have a single thread of experience—something closely related to our belief that (despite the changing “underlying elements” from which we are made) we are somehow persistent in time (at least during the span of our existence).

But operationally, how do we go from all those underlying branches of history to our perceived single thread of history? We can think of the states on different threads of history as being related by what we call a branchial graph, that joins states that have immediate common ancestors. And in the limit of many threads, we can think of these different states as being laid out “branchial space”. (In traditional quantum mechanics terms, this layout defines a “map of quantum entanglements”—with each piece of common ancestry representing an entanglement between states.)

In physical space—whether we’re looking at molecules in a fluid or atoms of space—we can think of us operating as observers who are physically large enough to span many underlying discrete elements, so that what we end up observing is just some kind of aggregate, averaged result. And it’s very much the same kind of thing in branchial space: we as observers tend to be large enough in branchial space to be spread across an immense number of branches of history, so that what we observe is just aggregate, averaged results across all those branches.

There’s lots of detailed complexity in what happens on different branches, just like there is in what happens to different molecules, or different atoms of space. And the reason is that there’s inevitably computational irreducibility, or, in this case, more accurately, multicomputational irreducibility. But as computationally bounded observers we just perceive aggregate results that “average out” the “underlying apparent randomness” to give a consistent single thread of experience.

And effectively this is what happens in the transition from quantum to classical behavior. Even though there are many possible detailed (“quantum”) threads of history that an object can follow, what we perceive corresponds to a single consistent “aggregate” (“classical”) sequence of behavior.

And this is typically true even at the level of our typical observation of molecules and chemical processes. Yes, there are many possible threads of history for, say, a water molecule. But most of our observations aggregate things to the point where we can talk about a definite shape for the molecule, with definite “chemical bonds”, etc.

But there is a special situation that actually looms large in typical discussions of quantum mechanics. We can think of it as the result of doing measurements that aren’t “aggregating threads of history to get an average”, but are instead doing something more like a weighing balance, always “tipping” one way or the other. In the language of quantum computing, we might say that we’re arranging things to be able to “measure a single qubit”. In terms of the equivalencing of states, we might say that we’re equivalencing lots of underlying states to specific canonical states (like “spin up” and “spin down”).

Why do we get one outcome rather than another? Ultimately we can think of it as all depending on the details of us as observers. To see this, let’s start from the corresponding question in physical space. We might ask why we observe some particular thing happening. Well, in our Physics Project everything about “what happens” is deterministic. But there’s still the “arbitrariness” of where we are in physical space. We’ll always basically see the same laws of physics, but the particulars of what we’ll observe depend on where we are, say on the surface of the Earth versus in interstellar space, etc.

Is there a “theory” for “where we are”? In some sense, yes, because we can go back and see why the molecules that make us up landed up in the particular place where they did. But what we can’t have an “external theory” for is just which molecules end up making up “us”, as we experience ourselves “from inside”. In our view of physics and the universe, it’s in some sense the only “ultimately subjective” thing: where our internal experience is “situated”.

And the point is that basically—even though it’s much less familiar—the same thing is going on at the level of quantum mechanics. Just as we “happen” to be at a certain place in physical space, so we’re at a certain place in branchial space. Looking back we can trace how we got here. But there’s no a priori way to determine “where our particular experience will be situated”. And that means we can’t know what the “local branchial environment” will be—and so, for example, what the outcome of “balance-like” measurements will be.

Just as in traditional discussions of quantum mechanics, the mechanics of doing the measurement—which we can think of as effectively equivalencing many underlying branches of history—will have an effect on subsequent behavior, and subsequent measurements.

But let’s say we look just at the level of the underlying multiway graph—or, more specifically, the multiway causal graph that records causal connections between different updating events. Then we can identify a complicated web of interdependence between events that are timelike, spacelike and branchlike separated. And this interdependence seems to correspond precisely to what’s expected from quantum mechanics.

In other words, even though the multiway graph is completely determined, the arbitrariness of “where the observer is” (particularly in branchial space), combined with the inevitable interdependence of different aspects of the multiway (causal) graph, seems sufficient to reproduce the not-quite-purely-probabilistic features of quantum mechanics.

In making observations in physical space, it’s common to make a measurement at one place or time, then make another measurement at another place or time, and, for example, see how they’re related. But in actually doing this, the observer will have to move from one place to the other, and persist from one time to another. And in the abstract it’s not obvious that that’s possible. For example, it could be that an observer won’t be able to move without changing—or, in other words, that “pure motion” won’t be possible for an observer. But in effect this is something we as observers assume about ourselves. And indeed, as I’ve discussed elsewhere, this is a crucial part of why we perceive spacetime to operate according to the laws of physics we know.

But what about in branchial space? We have much less intuition for this than for physical space. But we still effectively believe that pure motion is possible for us as observers in branchial space. It could be—like an observer in physical space, say, near a spacetime singularity—that an observer would get “shredded” when trying to “move” in branchial space. But our belief is that typically nothing like that happens. At some level being at different locations in branchial space presumably corresponds to picking different bases for our quantum states, or effectively to defining our experiments differently. And somehow our belief in the possibility of pure motion in branchial space seems related to our belief in the possibility of making arbitrary sequences choices in sets of experiments we do.

Observers of Abstract Worlds

We might have thought that the only thing ultimately “out there” for us to observe would be our physical universe. But actually there are important situations where we’re essentially operating not as observers of our familiar physical universe, but instead of what amount to abstract universes. And what we’ll see is that the ideas of observer theory seem to apply there too—except that now what we’re picking out and reducing to “internal impressions” are features not of the physical world but of abstract worlds.

Our Physics Project in a sense brings ideas about the physical and abstract worlds closer—and the concept of the ruliad ultimately leads to a deep unification between them. For what we now imagine is that the physical universe as we perceive it is just the result of the particular kind of sampling of the ruliad made by us as certain kinds of observers. And the point is that we as observers can make other kinds of samplings, leading to what we can describe as abstract universes. And one particularly prominent example of this is mathematics, or rather, metamathematics.

Imagine starting from all possible axioms for mathematics, then constructing the network of all possible theorems that can be derived from them. We can consider this as forming a kind of “metamathematical universe”. And the particular mathematics that some mathematician might study we can then think of as the result of a “mathematical observer” observing that metamathematical universe.

There are both close analogies and differences between this and the experience of a physical observer in the physical universe. Both ultimately correspond to samplings of the ruliad, but somewhat different ones.

In our Physics Project we imagine that physical space and everything in it is ultimately made up of discrete elements that we identify as “atoms of space”. But in the ruliad in general we can think of everything being made up of “pure atoms of existence” that we call emes. In the particular case of physics we interpret these emes as atoms of space. But in metamathematics we can think of emes as corresponding to (“subaxiomatic”) elements of symbolic structures—from which things like axioms or theorems can be constructed.

A central feature of our interaction with the ruliad for physics is that observers like us don’t track the detailed behavior of all the various atoms of space. Instead, we equivalence things to the point where we get descriptions that are reduced enough to “fit in our minds”. And something similar is going on in mathematics.

We don’t track all the individual subaxiomatic emes—or usually in practice even the details of fully formalized axioms and theorems. Instead, mathematics typically operates at a much higher and “more human” level, dealing not with questions like how real numbers can be built from emes—or even axioms—but rather with what can be deduced about the properties of mathematical objects like real numbers. In a physics analogy to the behavior of a gas, typical human mathematics operates not at the “molecular” level of individual emes (or even axioms) but rather at the “fluid dynamics” level of “human-accessible” mathematical concepts.

In effect, therefore, a mathematician is operating as an observer who equivalences many detailed configurations—ultimately of emes—in order to form higher-level mathematical constructs suitable for our computationally bounded minds. And while at the outset one might have imagined that anything in the ruliad could serve as a “possible mathematics”, the point is that observers like us can only sample the ruliad in particular ways—leading to only particular possible forms for “human-accessible” mathematics.

It’s a very similar story to the one we’ve encountered many times in thinking about physics. In studying gases, for example, we could imagine all sorts of theories based on tracking detailed molecular motions. But for observers like us—with our computational boundedness—we inevitably end up with things like the Second Law of thermodynamics, and the laws of fluid mechanics. And in mathematics the main thing we end up with is “higher-level mathematics”—mathematics that we can do directly in terms of typical textbook concepts, rather than constantly having to “drill down” to the level of axioms, or emes.

In physics we’re usually particularly concerned with issues like predicting how things will evolve through time. In mathematics it’s more about accumulating what can be considered true. And indeed we can think of an idealized mathematician as going through the ruliad and collecting in their minds a “bag” of theorems (or axioms) that they “consider to be true”. And given such a collection, they can essentially follow the “entailment paths” defined by computations in the ruliad to find more theorems to “add to their bag”. (And, yes, if they put in a false theorem then—because a false premise in the standard setup of logic implies everything—they’ll end up with an “infinite explosion of theorems”, that won’t fit in a finite mind.)

In observing the physical universe, we talk about our different possible senses (like vision, hearing, etc.) or different kinds of measuring devices. In observing the metamathematical universe the analogy is basically different possible kinds of theories or abstractions—say, algebraic vs. geometrical vs. topological vs. categorical, etc. (with new approaches being like new kinds of measuring devices).

Particularly when we think in terms of the ruliad we can expect a certain kind of ultimate unity in the metamathematical universe—but different theories and different abstractions will pick up different aspects of it, just as vision and hearing pick up different aspects of the physical universe. But in a sense observer theory gives us a global way to talk about this, and to characterize what kinds of observations observers like us can make—whether of the physical universe or the metamathematical one.

In physics we’ve then seen in our Physics Project how this allows us to find general laws that describe our perception of the physical world—and that turn out to reproduce the core known laws of physics. In mathematics we’re not as familiar with the concept of general laws, though the very fact that higher-level mathematics is possible is presumably in essence such a law, and perhaps the kinds of regularities seen in areas like category theory are others—as are the inevitable dualities we expect to be able to identify between different fields of mathematics. All these laws ultimately rely on the structure of the ruliad. But the crucial point is that they’re not talking about the “raw ruliad”; instead they’re talking about just certain samplings of the ruliad that can be done by observers like us, and that lead to certain kinds of “internal impressions” in terms of which these laws can be stated.

Mathematics represents a certain kind of abstract setup that’s been studied in a particularly detailed way over the centuries. But it’s not the only kind of “abstract setup” we can imagine. And indeed there’s even a much more familiar one: the use of concepts—and words—in human thinking and language.

We might imagine that at some time in the distant past our forebears could signify, say, rocks only by pointing at individual ones. But then there emerged the general notion of “rock”, captured by a word for “rock”. And once again this is a story of observers and equivalences. When we look at a rock, it presumably produces all sorts of detailed patterns of neuron firings in our brains, different for each particular rock. But somehow—presumably essentially through evolution to an attractor in the neural net in our brains—we equivalence all these patterns to extract our “inner impression” of the “concept of a rock”.

In the typical tradition of quantitative science we tend to be interested in doing measurements that lead to things like numerical results. But in representing the world using language we tend to be interested instead in creating symbolic structures that involve collections of discrete words embedded in a grammatical framework. Such linguistic descriptions don’t capture every detail; in a typical observer kind of way they broadly equivalence many things—and in a sense reduce the complexity of the world to a description in terms of a limited number of discrete words and linguistic forms.

Within any given person’s brain there’ll be “thoughts” defined by patterns of neuron firings. And the crucial role of language is to provide a way to robustly “package up” those thoughts, and for example represent them with discrete words, so they can be communicated to another person—and unpacked in that person’s brain to produce neuron firings that reproduce what amount to those same thoughts.

When we’re dealing with something like a numerical measurement we might imagine that it could have some kind of absolute interpretation. But words are much more obviously an “arbitrary basis” for communication. We could pick a different specific word (say from a different human language) but still “communicate the same thing”. All that’s required is that everyone who’s using the word agrees on its meaning. And presumably that normally happens because of shared “social” history between people who use a given word.

It’s worth pointing out that for this to work there has to be a certain separation of scales. The collective impression of the meaning of a word may change over time, but that change has to be slow compared to the rate at which the word is used in actual communication. In effect, the meaning of a word—as we humans might understand it—emerges from the aggregation of many individual uses.

In the abstract, there might not be any reason to think that there’d be a way to “understand words consistently”. But it’s a story very much like what we’ve encountered in both physics and mathematics. Even though there are lots of complicated individual details “underneath”, we as observers manage to pick out features that are “simple enough for us to understand”. In the case of molecules in a gas that might be the overall pressure of the gas. And in the case of words it’s a stable notion of “meaning”.

Put another way, the possibility of language is another example of observer theory at work. Inside our brains there are all sorts of complicated neuron firings. But somehow these can be “packaged up” into things like words that form “human-level narratives”.

There’s a certain complicated feedback loop between the world as we experience it and the words we use to describe it. We invent words for things that we commonly encounter (“chair”, “table”, …). Yet once we have a word for something we’re more able to form thoughts about it, or communicate about it. And that in turn makes us more likely to put instances of it in our environment. In other words, we tend to build our environment so that the way we have of making narratives about it works well—or, in effect, so our inner description of it can be as simple as possible, and it can be as predictable to us as possible.

We can view our experience of physics and of mathematics as being the result of us acting as physical observers and mathematical observers. Now we’re viewing our experience of the “conceptual universe” as being the result of us acting as “conceptual observers”. But what’s crucial is that in all these cases, we have the same intrinsic features as observers: computational boundedness and a belief in persistence. The computational boundedness is what makes us equivalence things to the point where we can have symbolic descriptions of the world, for example in terms of words. And the belief in persistence is what lets those words have persistent meanings.

And actually these ideas extend beyond just language—to paradigms, and general ways of thinking about things. When we define a word we’re in effect defining an abstraction for a class of things. And paradigms are somehow a generalization of this: ways of taking lots of specifics and coming up with a uniform framework for them. And when we do this, we’re in effect making a classic observer theory move—and equivalencing lots of different things to produce an “internal impression” that’s “simple enough” to fit in our finite minds.

In the End It’s All Just the Ruliad

Our tendency as observers is always to believe that we can separate our “inner experience” from what’s going on in the “outside world”. But in the end everything is just part of the ruliad. And at the level of the ruliad we as observers are ultimately “made of the same stuff” as everything else.

But can we imagine that we can point at one part of the ruliad and say “that’s an observer”, and at another part and say “that’s not”? At least to some extent the answer is presumably yes—at least if we restrict ourselves to “observers like us”. But it’s a somewhat subtle—and seemingly circular—story.

For example, one core feature of observers like us is that we have a certain persistence, or at least we believe we have a certain persistence. But, inevitably, at the level of the “raw ruliad”, we’re continually being made from different atoms of existence, i.e. different emes. So in what sense are we persistent? Well, the point is that an observer can equivalence those successive patterns of emes, so that what they observe is persistent. And, yes, this is at least on the face of it circular. And ultimately to identify what parts of the ruliad might be “persistent enough to be observers”, we’ll have to ground this circularity in some kind of further assumption.

What about the computational boundedness of observers like us, which forces us to do lots of equivalencing? At some level that equivalencing must be implemented by lots of different states evolving to the same states. But once again there’s circularity, because even to define what we mean by “the same states” (“Are isomorphic graphs the same?”, etc.) we have to be imagining certain equivalencing.

So how do we break out of the circularity? The key is presumably the presence of additional features that define “observers like us”. And one important class of such features has to do with scale.

We’re neither tiny nor huge. We involve enough emes that consistent averages can emerge. Yet we don’t involve so many emes that we span anything but an absolutely tiny part of the whole ruliad.

And actually a lot of our experience is determined by “our size as observers”. We’re large enough that certain equivalencing is inevitable. Yet we’re small enough that we can reasonably think of there being many choices for “where we are”.

The overall structure of the ruliad is a matter of formal necessity; there’s only one possible way for it to be. But there’s contingency in our character as observers. And for example in a sense there’s a fundamental constant of nature as we perceive it, which is our extent in the ruliad, say measured in emes (and appropriately projected into physical space, branchial space, etc.).

And the fact that this extent is small compared to the whole ruliad means that there are “many possible observers”—who we can think of as existing at different positions in the ruliad. And those different observers will look at the ruliad from different “points of view”, and thus develop different “internal impressions” of “perceived reality”.

But a crucial fact central to our Physics Project is that there are certain aspects of that perceived reality that are inevitable for observers like us—and that correspond to core laws of physics. But when it gets to more specific questions (“What does the night sky look like from where you are?”, etc.) different observers will inevitably have different versions of perceived reality.

So is there a way to translate from one observer to another? Essentially that’s a story of motion. What happens when an observer at one place in the ruliad “moves” to another place? Inevitably, the observer will be “made of different emes” if it’s at a different place. But will it somehow still “be the same”? Well, that’s a subtle question, that depends both on the background structure of the ruliad, and the nature of the observer.

If the ruliad is “too wild” (think: spacetime near a singularity) then the observer will inevitably be “shredded” as it “moves”. But computational irreducibility implies a certain overall regularity to most of the ruliad, making “pure motion” at least conceivable. But to achieve “pure motion” the observer still has to be “made of” something that is somehow robust—essentially some “lump of computational reducibility” that can “predictably survive” the underlying background of computational irreducibility.

In spacetime we can identify such “lumps” with things like black holes, and particles like electrons, photons, etc. (and, yes, in our models there’s probably considerable commonality between black holes and particles). It’s not yet clear quite what the analog is in branchial space, though a very simple example might involve persistence of qubits. And in rulial space, one kind of analog is the very notion of concepts. For in effect concepts (as represented for example by words) are the analog of particles in rulial space: they are the robust structures that can move across rulial space and “maintain their identity”, carrying “the same thoughts” to different minds.

So what does all this mean for what can constitute an observer in the ruliad? Observers in effect leverage computational reducibility to extract simplified features that can “fit in finite minds”. But observers themselves must also embody computational reducibility in order to maintain their own persistence and the persistence of the features they extract. Or in other words, observers must in a sense always correspond to “patches of regularity” in the ruliad.

But can any patch of regularity in the ruliad be thought of as an observer? Probably not usefully so. Because another feature of observers like us is that we are connected in some kind of collective “social” framework. Not only do we individually form internal impressions in our minds, but we also communicate these impressions. And indeed without such communication we wouldn’t, for example, be able to set up things like coherent languages with which to describe things.

What We Assume about Ourselves

A key implication of our Physics Project and the concept of the ruliad is that we perceive the universe to be the way we do because we are the way we are as observers. And the most fundamental aspect of observers like us is that we’re doing lots of equivalencing to reduce the “complexity of the world” to “internal impressions” that “fit into our minds”. But just what kinds of equivalencing are we actually doing? At some level a lot of that is defined by the things we believe—or assume—about ourselves and the way we interact with the world.

A very central assumption we make is that we’re somehow “stable observers” of a changing “outside world”. Of course, at some level we’re actually not “stable” at all: we’re built up from emes whose configuration is changing all the time. But our belief in our own stability—and, in effect, our belief in our “persistence in time”—makes us equivalence those configurations. And having done that equivalencing we perceive the universe to operate in a certain way, that turns out to align with the laws of physics we know.

But actually there’s more than just our assumption of persistence in time. For example, we also have an assumption of persistence in space: we assume that—at least on reasonably short timescales—we’re consistently “observing the universe from the same place”, and not, say, “continually darting around”. The network that represents space is continually changing “around us”. But we equivalence things so that we can assume that—in a first approximation—we are “staying in the same place”.

Of course, we don’t believe that we have to stay in exactly the same place all the time; we believe we’re able to move. And here we make what amounts to another “assumption of stability”: we assume that pure motion is possible for us as observers. In other words, we assume that we can “go to different places” and still be “the same us”, with the same properties as observers.

At the level of the “raw ruliad” it’s not at all obvious that such assumptions can be consistently made. But as we discussed above, the fact that for observers like us they can (at least to a good approximation) is a reflection of certain properties of us as observers—in particular of our physical scale, being large in terms of atoms of space but small in terms of the whole universe.

Related to our assumption about motion is our assumption that “space exists”—or that we can treat space as something coherent. Underneath, there’s all sorts of complicated dynamics of changing patterns of emes. But on the timescales at which we experience things we can equivalence these patterns to allow us to think of space as having a “coherent structure”. And, once again, the fact that we can do this is a consequence of physical scales associated with us as observers. In particular, the speed of light is “fast enough” that it brings information to us from the local region around us in much less time than it takes our brain to process it. And this means that we can equivalence all the different ways in which different pieces of information reach us, and we can consistently just talk about the state of a region of space at a given time.

Part of our assumption that we’re “persistent in time” is that our thread of experience is—at least locally—continuous, with no breaks. Yes, we’re born and we die—and we also sleep. But we assume that at least on scales relevant for our ongoing perception of the world, we experience time as something continuous.

More than that, we assume that we have just a single thread of experience. Or, in other words, that there’s always just “one us” going through time. Of course, even at the level of neurons in our brains all sorts of activity goes on in parallel. But somehow in our normal psychological state we seem to concentrate everything so that our “inner experience” follows just one “thread of history”, on which we can operate in a computationally bounded way, and form definite memories and have definite sequences of thoughts.

We’re not as familiar with branchial space as with physical space. But presumably our “fundamental assumption of stability” extends there as well. And when combined with our basic computational boundedness it then becomes inevitable that (as we discussed above) we’ll conflate different “quantum paths of history” to give us as observers a definite “classical thread of inner experience”.

Beyond “stability”, another very important assumption we implicitly make about ourselves is what amounts to an assumption of “independence”. We imagine that we can somehow separate ourselves off from “everything else”. And one aspect of this is that we assume we’re localized—and that most of the ruliad “doesn’t matter to us”, so that we can equivalence all the different states of the “rest of the ruliad”.

But there’s also another aspect of “independence”: that in effect we can choose to do “whatever we want” independent of the rest of the universe. And this means that we assume we can, for example, essentially “do any possible experiment”, make any possible measurement—or “go anywhere we want” in physical or branchial space, or indeed rulial space. We assume that we effectively have “free will” about these things—determined only by our “inner choices”, and independent of the state of the rest of the universe.

Ultimately, of course, we’re just part of the ruliad, and everything we do is determined by the structure of the ruliad and our history within it. But we can view our “belief of freedom” as a reflection of the fact that we don’t know a priori where we’ll be located in the ruliad—and even if we did, computational irreducibility would prevent us from making predictions about what we will do.

Beyond our assumptions about our own “independence from the rest of the universe”, there’s also the question of independence between different parts of what we observe. And quite central to our way of “parsing the world” is our typical assumption that we can “think about different things separately”. In other words, we assume it’s possible to “factor” what we see happening in the universe into independent parts.

In science, this manifests itself in the idea that we can do “controlled experiments” in which we study how something behaves in isolation from everything else. It’s not self-evident that this will be possible (and indeed in areas like ethics it might fundamentally not be), but we as observers tend to implicitly assume it.

And actually, we normally go much further. Because we typically assume that we can describe—and think about—the world “symbolically”. In other words, we assume that we can take all the complexity of the world and represent at least the parts of it that we care about in terms of discrete symbolic concepts, of the kind that appear in human (or computational) language. There’s lots of detail in the world that our limited collection of symbolic concepts doesn’t capture, and effectively “equivalences out”. But the point is that it’s this symbolic description that normally seems to form the backbone of the “inner narrative” we have about the world.

There’s another implicit assumption that’s being made here, however. And that’s that there’s some kind of stability in the symbolic concepts we’re using. Yes, any particular mind might parse the world using a particular set of symbolic concepts. But we make the implicit assumption that there are other minds out there that work like ours. And this makes us imagine that there can be some form of “objective reality” that’s just “always out there”, to be sampled by whatever mind might happen to come along.

Not only, therefore, do we assume our own stability as observers; we also assume a certain stability to what we perceive of “everything that’s out there”. Underneath, there’s all the wildness and complexity of the ruliad. But we assume that we can successfully equivalence things to the point where all we perceive is something quite stable—and something that we can describe as ultimately governed by consistent laws.

It could be that every part of the universe just “does its own thing”, with no overall laws tying everything together. But we make the implicit assumption that, no, the universe—at least as far as we perceive it—is a more organized and consistent place. And indeed it’s that assumption that makes it feasible for us to operate as observers like us at all, and to even imagine that we can usefully reduce the complexity of the world to something that “fits in our finite minds”.

The Cost of Observation

What resources does it take for an observer to make an observation? In most of traditional science, observation is at best added as an afterthought, and no account is taken of the process by which it occurs. And indeed, for example, in the traditional formalism of quantum mechanics, while “measurement” can have an effect on a system, it’s still assumed to be an “indivisible act” without any “internal process”.

But in observer theory, we’re centrally talking about the process of observation. And so it makes sense to try asking questions about the resources involved in this process.

We might start with our own everyday experience. Something happens out in the world. What resources—and, for example, how much time—does it take us to “form an impression of it”? Let’s say that out in the world a cat either comes into view or it doesn’t. There are signals that come to our brain from our eyes, effectively carrying data on each pixel in our visual field. Then, inside our brain, these signals are processed by a succession of layers of neurons, with us in the end concluding either “there’s a cat there”, or “there’s not”.

And from artificial neural nets we can get a pretty good idea of how this likely works. And the key to it—as we discussed above—is that there’s an attractor. Lots of different detailed configurations of pixels all evolve either to the “cat” or “no cat” final state. The different configurations have been equivalenced, so that only a “final conclusion” survives.

The story is a bit trickier though. Because “cat” or “no cat” really isn’t the final state of our brain; hopefully it’s not the “last thought we have”. Instead, our brain will continue to “think more thoughts”. So “cat”/”no cat” is at best some kind of intermediate waypoint in our process of thinking; an instantaneous conclusion that we’ll continue to “build on”.

And indeed when we consider measuring devices (like a piston measuring the pressure of a gas) we similarly usually imagine that they will “come to an instantaneous conclusion”, but “continue operating” and “producing more data”. But how long should we wait for each intermediate conclusion? How long, for example, will it take for the stresses generated by a particular pattern of molecules hitting a piston to “dissipate out”, and for the piston to be “ready to produce more data”?

There are lots of specific questions of physics here. But if our purpose is to build a formal observer theory, how should we think about such things? There is something of an analogy in the formal theory of computation. An actual computational system—say in the physical world—will just “keep computing”. But in formal computation theory it’s useful to talk about computations that halt, and about functions that can be “evaluated” and give a “definite answer”. So what’s the analog of this in observer theory?

Instead of general computations, we’re interested in computations that effectively “implement equivalences”. Or, put another way, we want computations that “destroy information”—and that have many incoming states but few outgoing ones. As a practical matter, we can either have the outgoing states explicitly represent whole equivalence classes, or they can just be “canonical representatives”—like in a network where at each step each element takes on whatever the “majority” or “consensus” value of its neighbors was.

But however it works, we can still ask questions about what computational resources were involved. How many steps did it take? How many elements were involved?

And with the idea that observers like us are “computationally bounded”, we expect limitations on these resources. But with this formal setup we can start asking just how far an observer like us can get, say in “coming to a conclusion” about the results of some computationally irreducible process.

An interesting case arises in putative quantum computers. In the model implied by our Physics Project, such a “quantum computer” effectively “performs many computations in parallel” on the separate branches of a multiway system representing the various threads of history of the universe. But if the observer tries to “come to a conclusion” about what actually happened, they have to “knit together” all those threads of history, in effect by implementing equivalences between them.

One could in principle imagine an observer who’d just follow all the quantum branches. But it wouldn’t be an observer like us. Because what seems to be a core feature of observers like us is that we believe we have just a single thread of experience. And to maintain that belief, our “process of observation” must equivalence all the different quantum branches.

How much “effort” will that be? Well, inevitably if a thread of history branched, our equivalencing has to “undo that branching”. And that suggests that the number of “elementary equivalencings” will have to be at least comparable to the number of “elementary branchings”—making it seem that the “effort of observation” will tend to be at least comparable to reduction of effort associated with parallelism in the “underlying quantum process”.

In general it’s interesting to compare the “effort of observation” with the “effort of computation”. With our concept of “elementary equivalencings” we have a way to measure both in terms of computational operations. And, yes, both could in principle be implemented by something like a Turing machine, though in practice the equivalencings might be most conveniently modeled by something like string rewriting.

And indeed one can often go much further, talking not directly in terms of equivalencings, but rather about processes that show attractors. There are different kinds of attractors. Sometimes—as in class 1 cellular automata—there are just a limited number of static, global fixed points (say, either all cells black or all cells white). But in other cases—such as class 3 cellular automata—the number of “output states” may be smaller than the number of “input states” but there may be no computationally simple characterization of them.

“Observers like us”, though, mostly seem to make use of the fixed points. We try to “symbolicize the world”, taking all the complexities “out there”, and reducing them to “discrete conclusions”, that we might for example describe using the discrete words in a language.

There’s an immediate subtlety associated with attractors of any kind, though. Typical physics is reversible, in the sense that any process (say two molecules scattering from each other) can run equally well forwards and backwards. But in an attractor one goes from lots of possible initial states to a smaller number of “attractor” final states. And there are two basic ways this can happen, even when there’s underlying reversibility. First, the system one’s studying can be “open”, in the sense that effects can “radiate” out of the region that one’s studying. And second, the states the system gets into can be “complicated enough” that, say, a computationally bounded observer will inevitably equivalence them. And indeed that’s the main thing that’s happening, for example, when a system “reaches thermodynamic equilibrium”, as described by the Second Law.

And actually, once again, there’s often a certain circularity. One is trying to determine whether an observer has “finished observing” and “come to a conclusion”. But one needs an observer to make that determination. Can we tell if we’ve finished “forming a thought”? Well, we have to “think about it”—in effect by forming another thought.

Put another way: imagine we are trying to determine whether a piston has “come to a conclusion” about pressure in a gas. Particularly if there’s microscopic reversibility, the piston and things around it will “continue wiggling around”, and it’ll “take an observer” to determine whether the “heat is dissipated” to the point where one can “read out the result”.

But how do we break out of what seems like an infinite regress? The point is that whatever mind is ultimately forming the impression that is “the observation” is inevitably the final arbiter. And, yes, this could mean that we’d always have to start discussing all sorts of details about photoreceptors and neurons and so on. But—as we’ve discussed at length—the key point that makes a general observer theory possible is that there are many conclusions that can be drawn for large classes of observers, quite independent of these details.

But, OK, what happens if we think about the raw ruliad? Now all we have are emes and elementary events updating the configuration of them. And in a sense we’re “fishing out of this” pieces that represent observers, and pieces that represent things they’re observing. Can we “assess the cost of observation” here? It really depends on the fundamental scale of what we consider to be observers. And in fact we might even think of our scale as observers (say measured in emes or elementary events) as defining a “fundamental constant of nature”—at least for the universe as we perceive it. But given this scale, we can for example ask for there to develop “consensus across it”, or at least for “every eme in it to have had time to communicate with every other”.

In an attempt to formalize the “cost of observation” we’ll inevitably have to make what seem like arbitrary choices, just as we would in setting up a scheme to determine when an ongoing computational process has “generated an answer”. But if we assume a certain boundedness to our choices, we can expect that we’ll be able to draw definite conclusions, and in effect be able to construct an analog of computational complexity theory for processes of observation.

The Future of Observer Theory

My goal here has been to explore some of the key concepts and principles needed to create a framework that we can call observer theory. But what I’ve done is just the beginning, and there is much still to be done in fleshing out the theory and investigating its implications.

One important place to start is in making more explicit models of the “mechanics of observation”. At the level of the general theory, it’s all about equivalencing. But how specifically is that equivalencing achieved in particular cases? There are many thousands of kinds of sensors, measuring devices, analysis methods, etc. All of these should be systematically inventoried and classified. And in each case there’s a metamodel to be made, that clarifies just how equivalencing is achieved, and, for example, what separation of physical (or other) scales make it possible.

Human experience and human minds are the inspiration—and ultimate grounding—for our concept of an observer. And insofar as neural nets trained on what amounts to human experience have emerged as somewhat faithful models for what human minds do, we can expect to use them as a fairly detailed proxy for observers like us. So, for example, we can imagine exploring things like quantum observers by studying multiway generalizations of neural nets. (And this is something that becomes easier if instead of organizing their data into real-number weights we can “atomize” neural nets into purely discrete elements.)

Such investigations of potentially realistic models provide a useful “practical grounding” for observer theory. But to develop a general observer theory we need a more formal notion of an observer. And there is no doubt a whole abstract framework—perhaps using methods from areas like category theory—that can be developed purely on the basis of our concept of observers being about equivalencing.

But to understand the connection of observer theory to things like science as done by us humans, we need to tighten up what it means to be an “observer like us”. What exactly are all the general things we “believe about ourselves”? As we discussed above, many we so much take for granted that it’s challenging for us to identify them as actually just “beliefs” that in principle don’t have to be that way.

But I suspect that the more we can tighten up our definition of “observers like us”, the more we’ll be able to explain why we perceive the world the way we do, and attribute to it the laws and properties we do. Is there some feature of us as observers, for example, that makes us “parse” the physical world as being three-dimensional? We could represent the same data about what’s out there by assigning a one-dimensional (“space-filling”) coordinate to everything. But somehow observers like us don’t do that. And instead, in effect, we “probe the ruliad” by sampling it in what we perceive as 3D slices. (And, yes, the most obvious coarse graining just considers progressively larger geodesic balls, say in the spatial hypergraphs that appear in our Physics Project—but that’s probably at best just an approximation to the sampling observers like us do.)

As part of our Physics Project we’ve discovered that the structure of the three main theories of twentieth-century physics (statistical mechanics, general relativity and quantum mechanics) can be derived from properties of the ruliad just by knowing that observers like us are computationally bounded and believe we’re persistent in time. But how might we reach, say, the Standard Model of particle physics—with all its particular values of parameters, etc.? Some may be inevitable, given the underlying structure of our theory. But others, one suspects, are in effect reflections of aspects of us as observers. They are “derivable”, but only given our particular character—or beliefs—as observers. And, yes, presumably things like the “constant of nature” that characterizes “our size in emes” will appear in the laws we attribute to the universe as we perceive it.

And, by the way, these considerations of “observers like us” extend beyond physical observers. Thus, for example, as we tighten up our characterization of what we’re like as mathematical observers, we can expect that this will constrain the “possible laws of our mathematical universe”. We might have thought that we could “pick whatever axioms we want”, in effect sampling the ruliad to get any mathematics we want. But, presumably, observers like us can’t do this—so that questions like “Is the continuum hypothesis true?” can potentially have definite answers for any observers like us, and for any coherent mathematics that we build.

But in the end, do we really have to consider observers whose characteristics are grounded in human experience? We already reflexively generalize our own personal experiences to those of other humans. But can we go further? We don’t have the internal experience of being a dog, an ant colony, a computer, or an ocean. And typically at best we anthropomorphize such things, trying to reduce the behavior we perceive in them to elements that align with our own human experience.

But are we as humans just stuck with a particular kind of “internal experience”? The growth of technology—and in particular sensors and measuring devices—has certainly expanded the range of inputs that can be delivered to our brains. And the growth of our collective knowledge about the world has expanded our ways of representing and thinking about things. Right now those are basically our only ways of modifying our detailed “internal experience”. But what if we were to connect directly—and internally—into our brains?

Presumably, at least at first, we’d need the “neural user interface” to be familiar—and we’d be forced into, for example, concentrating everything into a single thread of experience. But what if we allowed “multiway experience”? Well, of course our brains are already made up of billions of neurons that each do things. But it seems to be a core feature of human experience that we concentrate those things to give a single thread of experience. And that seems to be an essential feature of being an “observer like us”.

That kind of concentration also happens in a flock of birds, an ant colony—or a human society. In all these cases, each individual organism “does their thing”. But somehow collective “decisions” get made, with many different detailed situations getting equivalenced together to leave only the “final decision”. So that means that from the outside, the system behaves as we would expect of an “observer like us”. Internally, that kind of “observer behavior” is happening “above the experience” of each single individual. But still, at the level of the “hive mind” it’s behavior typical of an observer like us.

That’s not to say, though, that we can readily imagine what it’s like to be a system like this, or even to be one of its parts. And in the effort to explore observer theory an important direction is to try to imagine ourselves having a different kind of experience than we do. And from “within” that experience, try to see what kind of laws would we attribute, say, to the physical universe.

In the early twentieth century, particularly in the context of relativity and quantum mechanics, it became clear that being “more realistic” about the observer was crucial in moving forward in science. Things like computational irreducibility—and even more so, our Physics Project—take that another step.

One used to imagine that science should somehow be “fundamentally objective”, and independent of all aspects of the observer. But what’s become clear is that it’s not. And that the nature of us as observers is actually crucial in determining what science we “experience”. But the crucial point is that there are often powerful conclusions that can be drawn even without knowing all the details of an observer. And that’s a central reason for building a general observer theory—in effect to give an objective way of formally and robustly characterizing what one might consider to be the subjective element in science.

Note

There are no doubt many precursors of varying directness that can be found to the things I discuss here; I have not attempted a serious historical survey. In my own work, a notable precursor from 2002 is Chapter 10 of A New Kind of Science, entitled “Processes of Perception and Analysis”. I thank many people involved with our Wolfram Physics Project for related discussions, including Xerxes Arsiwalla, Hatem Elshatlawy and particularly Jonathan Gorard.

Aggregation and Tiling as Multicomputational Processes

Stephen Wolfram — Fri, 03 Nov 2023 22:32:12 +0000

The Importance of Multiway Systems

It’s all about systems where there can in effect be many possible paths of history. In a typical standard computational system like a cellular automaton, there’s always just one path, defined by evolution from one state to the next. But in a multiway system, there can be many possible next states—and thus many possible paths of history. Multiway systems have a central role in our Physics Project, particularly in connection with quantum mechanics. But what’s now emerging is that multiway systems in fact serve as a quite general foundation for a whole new “multicomputational” paradigm for modeling.

My objective here is twofold. First, I want to use multiway systems as minimal models for growth processes based on aggregation and tiling. And second, I want to use this concrete application as a way to develop further intuition about multiway systems in general. Elsewhere I have explored multiway systems for strings, multiway systems based on numbers, multiway Turing machines, multiway combinators, multiway expression evaluation and multiway systems based on games and puzzles. But in studying multiway systems for aggregation and tiling, we’ll be dealing with something that is immediately more physical and tangible.

When we think of “growth by aggregation” we typically imagine a “random process” in which new pieces get added “at random” to something. But each of these “random possibilities” in effect defines a different path of history. And the concept of a multiway system is to capture all those possibilities together. In a typical random (or “stochastic”) model one’s just tracing a single path of history, and one imagines one doesn’t have enough information to say which path it will be. But in a multiway system one’s looking at all the paths. And in doing so, one’s in a sense making a model for the “whole story” of what can happen.

The choice of a single path can be “nondeterministic”. But the whole multiway system is deterministic. And by studying that “deterministic whole” it’s often possible to make useful, quite general statements.

One can think of a particular moment in the evolution of a multiway system as giving something like an ensemble of states of the kind studied in statistical mechanics. But the general concept of a multiway system, with its discrete branching at discrete steps, depends on a level of fundamental discreteness that’s quite unfamiliar from traditional statistical mechanics—though is perfectly straightforward to define in a computational, or even mathematical, way.

For aggregation it’s easy enough to set up a minimal discrete model—at least if one allows explicit randomness in the model. But a major point of what we’ll do here is to “go above” that randomness, setting up our model in terms of a whole, deterministic multiway system.

What can we learn by looking at this whole multiway system? Well, for example, we can see whether there’ll always be growth—whatever the random choices may be—or whether the growth will sometimes, or even always, stop. And in many practical applications (think, for example, tumors) it can be very important to know whether growth always stops—or through what paths it can continue.

A lot of what we’ll at first do here involves seeing the effect of local constraints on growth. Later on, we’ll also look at effects of geometry, and we’ll study how objects of different shapes can aggregate, or ultimately tile.

The models we’ll introduce are in a sense very minimal—combining the simplest multiway structures with the simplest spatial structures. And with this minimality it’s almost inevitable that the models will show up as idealizations of all sorts of systems—and as foundations for good models of these systems.

At first, multiway systems can seem rather abstract and difficult to grasp—and perhaps that’s inevitable given our human tendency to think sequentially. But by seeing how multiway systems play out in the concrete case of growth processes, we get to build our intuition and develop a more grounded view—that will stand us in good stead in exploring other applications of multiway systems, and in general in coming to terms with the whole multicomputational paradigm.

The Simplest Case

It’s the ultimate minimal model for random discrete growth (often called the Eden model). On a square grid, start with one black cell, then at each step randomly attach a new black cell somewhere onto the growing “cluster”:

After 10,000 steps we might get:

But what are all the possible things that can happen? For that, we can construct a multiway system:

A lot of these clusters differ only by a trivial translation; canonicalizing by translation we get

or after another step:

If we also reduce out rotations and reflections we get

or after another step:

The set of possible clusters after t steps are just the possible polyominoes (or “square lattice animals”) with t cells. The number of these for successive t is

growing roughly like k^t for large t, with k a little larger than 4:

By the way, canonicalization by translation always reduces the number of possible clusters by a factor of t. Canonicalization by rotation and reflection can reduce the number by a factor of 8 if the cluster has no symmetry (which for large clusters becomes increasingly likely), and by a smaller factor the more symmetry the cluster has, as in:

With canonicalization, the multiway graph after 7 steps has the form

and it doesn’t look any simpler with alternative rendering:

If we imagine that at each step, cells are added with equal probability at every possible position on the cluster, or equivalently that all outgoing edges from a given cluster in the uncanonicalized multiway graph are followed with equal probability, then we can get a distribution of probabilities for the distinct canonical clusters obtained—here shown after 7 steps:

One feature of the large random cluster we saw at the beginning is that it has some holes in it. Clusters with holes start developing after 7 steps, with the smallest being:

This cluster can be reached through a subset of the multiway system:

And in fact in the limit of large clusters, the probability for there to be a hole seems to approach 1—even though the total fraction of area covered by holes approaches 0.

One way to characterize the “space of possible clusters” is to create a branchial graph by connecting every pair of clusters that have a common ancestor one step back in the multiway graph:

The connectedness of all these graphs reflects the fact that with the rule we’re using, it’s always possible at any step to go from one cluster to another by a sequence of delete-one-cell/add-one-cell changes.

The branchial graphs here also show a 4-fold symmetry resulting from the symmetry of the underlying lattice. Canonicalizing the states, we get smaller branchial graphs that no longer show any such symmetry:

Totalistically Constrained Growth (4-Cell Neighborhoods)

With the rule we’ve been discussing so far, a new cell to be attached can be anywhere on a cluster. But what if we limit growth, by requiring that new cells must have certain numbers of existing cells around them? Specifically, let’s consider rules that look at the neighbors around any given position, and allow a new cell there only if there are specified numbers of existing cells in the neighborhood.

Starting with a cross of black cells, here are some examples of random clusters one gets after 20 steps with all possible rules of this type (the initial “4” designates that these are 4-neighbor rules):

Rules that don’t allow new cells to end up with just one existing neighbor can only fill in corners in their initial conditions, and can’t grow any further. But any rule that allows growth with only one existing neighbor produces clusters that keep growing forever. And here are some random examples of what one can get after 10,000 steps:

The last of these is the unconstrained (Eden model) rule we already discussed above. But let’s look more carefully at the first case—where there’s growth only if a new cell will end up with exactly one neighbor. The canonicalized multiway graph in this case is:

The possible clusters here correspond to polyominoes that are “always one cell wide” (i.e. have no 2×2 blocks), or, equivalently, have perimeter 2t + 2 at step t. The number of such canonicalized clusters grows like:

This is an increasing fraction of the total number of polyominoes—implying that most large polyominoes take this “spindly” form.

A new feature of a rule with constraints is that not all locations around a cluster may allow growth. Here is a version of the multiway system above, with cells around each cluster annotated with green if new growth is allowed there, and red if it never can be:

In a larger random cluster, we can see that with this rule, most of the interior is “dead” in the sense that the constraint of the rule allows no further growth there:

By the way, the clusters generated by this rule can always be directly represented by their “skeleton graphs”:

Looking at random clusters for all the (grow-with-1-neighbor) rules above, we see different patterns of holes in each case:

There are altogether five types of cells being distinguished here, reflecting different neighbor configurations:

Here’s a sample cluster generated with the 4:{1,3} rule:

Cells indicated with already have too many neighbors, and so can never be added to the cluster. Cells indicated with have exactly the right number of neighbors to be added immediately. Cells indicated with don’t currently have the right number of neighbors to grow, but if neighbors are filled in, they might be able to be added. Sometimes it will turn out that when neighbors of cells get filled in, they will actually prevent the cell from being added (so that it becomes )—and in the particular case shown here that happens with the 2×2 blocks of cells.

The multiway graphs from the rules shown here are all qualitatively similar, but there are detailed differences. In particular, at least for many of the rules, an increasing number of states are “missing” relative to what one gets with the grow-in-all-cases 4:{1,2,3,4} rule—or, in other words, there are an increasing number of polyominoes that can’t be generated given the constraints:

The first polyomino that can’t be reached (which occurs at step 4) is:

At step 6 the polyominoes that can’t be reached for rules 4:{1,3} and 4:{1,3,4} are

while for 4:{1} and 4:{1,4} the additional polyomino

can also not be reached.

At step 8, the polyomino

is reachable with 4:{1} and 4:{1,3} but not with 4:{1,4} and 4:{1,3,4}.

Of some note is that none of the rules that exclude polyominoes can reach:

Totalistically Constrained Growth (8-Cell Neighborhoods)

What happens if one considers diagonal as well orthogonal neighbors, giving a total of 8 neighbors around a cell? There are 256 possible rules in this case, corresponding to the possible subsets of Range[8]. Here are samples of what they do after 200 steps, starting from an initial cluster:

Two cases that at least initially show growth here are (the “8” designates that these are 8-neighbor rules):

In the {2} case, the multiway graph begins with:

One might assume that every branch in this graph would continue forever, and that growth would never “get stuck”. But it turns out that after 9 steps the following cluster is generated:

And with this cluster, no further growth is possible: no positions around the boundary have exactly 2 neighbors. In the multiway graph up to 10 steps, it turns out this is the only “terminal cluster” that can be generated—out of a total of 1115 possible clusters:

So how is that terminal cluster reached? Here’s the fragment of multiway graph that leads to it:

If we don’t prune off all the ways to “go astray”, the fragment appears as part of a larger multiway graph:

And if one follows all paths in the unpruned (and uncanonicalized) multiway graph at random (i.e. at each step, one chooses each branch with equal probability), it turns out that the probability of ever reaching this particular terminal cluster is just:

(And the fact that this number is fairly small implies that the system is far from confluent; there are many paths that, for example, don’t converge to the fixed point corresponding to this terminal cluster.)

If we keep going in the evolution of the multiway system, we’ll reach other terminal clusters; after 12 steps the following have appeared:

For the {3} rule above, the multiway system takes a little longer to “get going”:

Once again there are terminal clusters where the system gets stuck; the first of them appears at step 14:

And also once again the terminal cluster appears as an isolated node in the whole multiway system:

The fragment of multiway graph that leads to it is:

So far we’ve been finding terminal clusters by waiting for them to appear in the evolution of the multiway system. But there’s another approach, similar to what one might use in filling in something like a tiling. The idea is that every cell in a terminal cluster must have neighbors that don’t allow further growth. In other words, the terminal cluster must consist of certain “local tiles” for which the constraints don’t allow growth. But what configurations of local tiles are possible? To determine this, we turn the matching conditions for the tiles into logical expressions whose variables are True and False depending on whether particular positions in the template do or do not contain cells in the cluster. By solving the satisfiability problem for the combination of these logical expressions, one finds configurations of cells that could conceivably correspond to terminal clusters.

Following this procedure for the {2} rules with regions of up to 6×6 cells we find:

But now there’s an additional constraint. Assuming one starts from a connected initial cluster, any subsequent cluster generated must also be connected. Removing the non-connected cases we get:

So given these terminal clusters, what initial conditions can lead to them? To determine this we effectively have to invert the aggregation process—giving in the end a multiway graph that includes all initial conditions that can generate a given terminal cluster. For the smallest terminal cluster we get:

Our 4-cell “T” initial condition appears here—but we see that there are also even smaller 2-cell initial conditions that lead to the same terminal cluster.

For all the terminal clusters we showed before, we can construct the multiway graphs starting with the minimal initial clusters that lead to them:

For terminal clusters like

there’s no nontrivial multiway system to show, since these clusters can only appear as initial conditions; they can never be generated in the evolution.

There are quite a few small clusters that can only appear as initial conditions, and do not have preimages under the aggregation rule. Here are the cases that fit in a 3×3 region:

The case of the {3} rule is fairly similar to the {2} rule. The possible terminal clusters up to 5×5 are:

However, most of these have only a fairly limited set of possible preimages:

For example we have:

And indeed beyond the (size-17) example we already showed above, no other terminal clusters that can be generated from a T initial condition appear here. Sampling further, however, additional terminal clusters appear (beginning at size 25):

The fragments of multiway graphs for the first few of these are:

Random Evolution

We’ve seen above that for the rules we’ve been investigating, terminal clusters are quite rare among possible states in the multiway system. But what happens if we just evolve at random? How often will we wind up with a terminal cluster? When we say “evolve at random”, what we mean is that at each step we’re going to look at all possible positions where a new cell could be added to the cluster that exists so far, and then we’re going to pick with equal probability at which of these to actually add the new cell.

For the 8:{3} rule something surprising happens. Even though terminal clusters are rare in its multiway graph, it turns out that regardless of its initial conditions, it always eventually reaches a terminal cluster—though it often takes a while. And here, for example, are a few possible terminal clusters, annotated with the number of steps it took to reach them (which is also equal to the number of cells they contain):

The distribution of the number of steps to termination seems to be very roughly exponential (here based on a sample of 10,000 random cases)—with mean lifetime around 2300 and half-life around 7400:

Here’s an example of a large terminal cluster—that takes 21,912 steps to generate:

And here’s a map showing when growth in different parts of this cluster occurred (with blue being earliest and red being latest):

This picture suggests that different parts of the cluster “actively grow” at different times, and if we look at a “spacetime” plot of where growth occurs as a function of time, we can confirm this:

And indeed what this suggests is that what’s happening is that different parts of the cluster are at first “fertile”, but later inevitably “burn out”—so that in the end there are no possible positions left where growth can occur.

But what shapes can the final terminal clusters form? We can get some idea by looking at a “compactness measure” (of the kind often used to study gerrymandering) that roughly gives the standard deviation of the distances from the center of each cluster to each of the cells in it. Both “very stringy” and “roughly circular” clusters are fairly rare; most clusters lie somewhere in between:

If we look not at the 8:{3} but instead at the 8:{2} rule, things are very different. Once again, it’s possible to reach a terminal cluster, as the multiway graph shows. But now random evolution almost never reaches a terminal cluster, and instead almost always “runs away” to generate an infinite cluster. The clusters generated in this case are typically much more “compact” than in the 8:{3} case

and this is also reflected in the “spacetime” version:

Parallel Growth and Causal Graphs

In building up our clusters so far, we’ve always been assuming that cells are added sequentially, one at a time. But if two cells are far enough apart, we can actually add them “simultaneously”, in parallel, and end up building the same cluster. We can think of the addition of each cell as being an “event” that updates the state of the cluster. Then—just like in our Physics Project, and other applications of multicomputation—we can define a causal graph that represents the causal dependencies between these events, and then foliations of this causal graph tell us possible overall sequences of updates, including parallel.

As an example, consider this sequence of states in the “always grow” 4:{1,2,3,4} rule—where at each step the cell that’s new is colored red (and we’re including the “nothing” state at the beginning):

Every transition between successive states defines an event:

There’s then causal dependence of one event on another if the cell added in the second event is adjacent to the one added in the first event. So, for example, there are causal dependencies like

and

where in the second case additional “spatially separated” cells have been added that aren’t involved in the causal dependence. Putting all the causal dependencies together, we get the complete causal graph for this evolution:

We can recover our original sequence of states by picking a particular ordering of these events (here indicated by the positions of the cells they add):

This path has the property that it always follows the direction of causal edges—and we can make that more obvious by using a different layout for the causal graph:

But in general we can use any ordering of events consistent with the causal graph. Another ordering (out of a total of 40,320 possibilities in this case) is

which gives the sequence of states

with the same final cluster configuration, but different intermediate states.

But now the point is that the constraints implied by the causal graph do not require all events to be applied sequentially. Some events can be considered “spacelike separated” and so can be applied simultaneously. And in fact, any foliation of the causal graph defines a certain sequence for applying events—either sequentially or in parallel. So, for example, here is one particular foliation of the causal graph (shown with two different renderings for the causal graph):

And here is the corresponding sequence of states obtained:

And since in some slices of this foliation multiple events happen “in parallel”, it’s “faster” to get to the final configuration. (As it happens, this foliation is like a “cosmological rest frame foliation” in our Physics Project, and involves the maximum possible number of events happening on each slice.)

Different foliations (and there are a total of 678,972 possibilities in this case) will give different sequences of states, but always the same final state:

Note that nothing we’ve done here depends on the particular rule we’ve used. So, for example, for the 8:{2} rule with sequence of states

the causal graph is:

It’s worth commenting that everything we’ve done here has been for particular sequences of states, i.e. particular paths in the multiway graph. And in effect what we’re doing is the analog of classical spacetime physics—tracing out causal dependencies in particular evolution histories. But in general we could look at the whole multiway causal graph, with events that are not only timelike or spacelike separated, but also branchlike separated. And if we make foliations of this graph, we’ll end up not only with “classical” spacetime states, but also “quantum” superposition states that would need to be represented by something like multispace (in which at each spatial position, there is a “branchial stack” of possible cell values).

The One-Dimensional Case

So far we’ve been considering aggregation processes in two dimensions. But what about one dimension? In 1D, a “cluster” just consists of a sequence of cells. The simplest rule allows a cell to be added whenever it’s adjacent to a cell that’s already there. Starting from a single cell, here’s a possible random evolution according to such a rule, shown evolving down the page:

We can also construct the multiway system for this rule:

Canonicalizing the states gives the trivial multiway graph:

But just like in the 2D case things get less trivial if there are constraints on growth. For example, assume that before placing a new cell we count the number of cells that lie either distance 1 or distance 2 away. If the number of allowed cells can only be exactly 1 we get behavior like:

The corresponding multiway system is

or after canonicalization:

The number of distinct sequences after t steps here is given by

which can be expressed in terms of Fibonacci numbers, and for large t is about .

The rule in effect generates all possible Morse-code-like sequences, consisting of runs of either 2-cell (“long”) black blocks or 1-cell (“short”) black blocks, interspersed by “gaps” of single white cells.

The branchial graphs for this system have the form:

Looking at random evolutions for all possible rules of this type we get:

The corresponding canonicalized multiway graphs are:

The rules we’ve looked at so far are purely totalistic: whether a new cell can be added depends only on the total number of cells in its neighborhood. But (much like, for example, in cellular automata) it’s also possible to have rules where whether one can add a new cell depends on the complete configuration of cells in a neighborhood. Mostly, however, such rules seem to behave very much like totalistic ones.

Other generalizations include, for example, rules with multiple “colors” of cells, and rules that depend either on the total number of cells of different colors, or their detailed configurations.

The Three-Dimensional Case

The kind of analysis we’ve done for 2D and 1D aggregation systems can readily be extended to 3D. As a first example, consider a rule in which cells can be added along each of the 6 coordinate directions in a 3D grid whenever they are adjacent to an existing cell. Here are some typical examples of random clusters formed in this case:

Taking successive slices through the first of these (and coloring by “age”) we get:

If we allow a cell to be added only when it is adjacent to just one existing cell (corresponding to the rule 6:{1}) we get clusters that from the outside look almost indistinguishable

but which have an “airier” internal structure:

Much like in 2D, with 6 neighbors, there can’t be unbounded growth unless cells can be added when there is just one cell in the neighborhood. But in analogy to what happens in 2D, things get more complicated when we allow “corner adjacency” and have a 26-cell neighborhood.

If cells can be added whenever there’s at least one adjacent cell, the results are similar to the 6-neighbor case, except that now there can be “corner-adjacent outgrowths”

and the whole structure is “still airier”:

Little qualitatively changes for a rule like 26:{2} where growth can occur only with exactly 2 neighbors (here starting with a 3D dimer):

But the general question of when there is growth, and when not, is quite complicated and subtle. In particular, even with a specific rule, there are often some initial conditions that can lead to unbounded growth, and others that cannot.

Sometimes there is growth for a while, but then it stops. For example, with the rule 26:{9}, one possible path of evolution from a 3×3×3 block is:

The full multiway graph in this case terminates, confirming that no unbounded growth is ever possible:

With other initial conditions, however, this rule can grow for longer (here shown every 10 steps):

And from what one can tell, all rules 26:{n} lead to unbounded growth for , and do not for .

Polygonal Shapes

So far, we’ve been looking at “filling in cells” in grids—in 2D, 1D and 3D. But we can also look at just “placing tiles” without a grid, with each new tile attaching edge to edge to an existing tile.

For square tiles, there isn’t really a difference:

And the multiway system is just the same as for our original “grow anywhere” rule on a 2D grid:

Here’s now what happens for triangular tiles:

The multiway graph now generates all polyiamonds (triangular polyforms):

And since equilateral triangles can tessellate in a regular lattice, we can think of this—like the square case—as “filling in cells in a lattice” rather than just “placing tiles”. Here are some larger examples of random clusters in this case:

Essentially the same happens with regular hexagons:

The multiway graph generates all polyhexes:

Here are some examples of larger clusters—showing somewhat more “tendrils” than the triangular case:

And in an “effectively lattice” case like this we could also go on and impose constraints on neighborhood configurations, much as we did in earlier sections above.

But what happens if we consider shapes that do not tessellate the plane—like regular pentagons? We can still “sequentially place tiles” with the constraint that any new tile can’t overlap an existing one. And with this rule we get for example:

Here are some “randomly grown” larger clusters—showing all sorts of irregularly shaped interstices inside:

(And, yes, generating such pictures correctly is far from trivial. In the “effectively lattice” case, coincidences between polygons are fairly easy to determine exactly. But in something like the pentagon case, doing so requires solving equations in a high-degree algebraic number field.)

The multiway graph, however, does not show any immediately obvious differences from the ones for “effectively lattice” cases:

It makes it slightly easier to see what’s going on if we riffle the results on the last step we show:

The branchial graphs in this case have the form:

Here’s a larger cluster formed from pentagons:

And remember that the way this is built is sequentially to add one pentagon at each step by testing every “exposed edge” and seeing in which cases a pentagon will “fit”. As in all our other examples, there is no preference given to “external” versus “internal” edges.

Note that whereas “effectively lattice” clusters always eventually fill in all their holes, this isn’t true for something like the pentagon case. And in this case it appears that in the limit, about 28% of the overall area is taken up by holes. And, by the way, there’s a definite “zoo” of at least small possible holes, here plotted with their (logarithmic) probabilities:

So what happens with other regular polygons? Here’s an example with octagons (and in this case the limiting total area taken up by holes is about 35%):

And, by the way, here’s the “zoo of holes” in this case:

With pentagons, it’s pretty clear that difficult-to-resolve geometrical situations will arise. And one might have thought that octagons would avoid these. But there are still plenty of strange “mismatches” like

that aren’t easy to characterize or analyze. By the way, one should note that any time a “closed hole” is formed, the vectors corresponding to the edges that form its boundary must sum to zero—in effect defining an equation.

When the number of sides in the regular polygon gets large, our clusters will approximate circle packings. Here’s an example with 12-gons:

But of course because we’re insisting on adding one polygon at a time, the resulting structure is much “airier” than a true circle packing—of the kind that would be obtained (at least in 2D) by “pushing on the edges” of the cluster.

Polyomino Tilings

In the previous section we considered “sequential tilings” constructed from regular polygons. But the methods we used are quite general, and can be applied to sequential tilings formed from any shape—or shapes (or, at least, any shapes for which “attachment edges” can be identified).

As a first example, consider a domino or dimer shape—which we assume can be oriented both vertically and horizontally:

Here’s a somewhat larger cluster formed from dimers:

Here’s the canonicalized multiway graph in this case:

And here are the branchial graphs:

So what about other polyomino shapes? What happens when we try to sequentially tile with these—effectively making “polypolyominoes”?

Here’s an example based on an L-shaped polyomino:

Here’s a larger cluster

and here’s the canonicalized multiway graph after just 1 step

and after 2 steps:

The only other 3-cell polyomino is the tromino:

(For dimers, the limiting fraction of area covered by holes seems to be about 17%, while for L and tromino polyominoes, it’s about 27%.)

Going to 4 cells, there are 5 possible polyominoes—and here are samples of random clusters that can be built with them (note that in the last case shown, we require only that “subcells” of the 2×2 polyomino must align):

The corresponding multiway graphs are:

Continuing for more steps in a few cases:

Some polyominoes are “more awkward” to fit together than others—so these typically give clusters of “lower density”:

So far, we’ve always considered adding new polyominoes so that they “attach” on any “exposed edge”. And the result is that we can often get long “tendrils” in our clusters of polyominoes. But an alternative strategy is to try to add polyominoes as “compactly” as possible, in effect by adding successive “rings” of polyominoes (with “older” rings here colored bluer):

In general there are many ways to add these rings, and eventually one will often get stuck, unable to add polyominoes without leaving holes—as indicated by the red annotation here:

Of course, that doesn’t mean that if one was prepared to “backtrack and try again”, one couldn’t find a way to extend the cluster without leaving holes. And indeed for the polyomino we’re looking at here it’s perfectly possible to end up with “perfect tilings” in which no holes are left:

In general, we could consider all sorts of different strategies for growing clusters by adding polyominoes “in parallel”—just like in our discussion of causal graphs above. And if we add polyominoes “a ring at a time” we’re effectively making a particular choice of foliation—in which the successive “ring states” turn out be directly analogous to what we call “generational states” in our Physics Project.

If we allow holes (and don’t impose other constraints), then it’s inevitable that—just with ordinary, sequential aggregation—we can grow an unboundedly large cluster of polyominoes of any shape, just by always attaching one edge of each new polyomino to an “exposed” edge of the existing cluster. But if we don’t allow holes, it’s a different story—and we’re talking about a traditional tiling problem, where there are ultimately cases where tiling is impossible, and only limited-size clusters can be generated.

As it happens, all polyominoes with 6 or fewer cells do allow infinite tilings. But with 7 cells the following do not:

It’s perfectly possible to grow random clusters with these polyominoes—but they tend not to be at all compact, and to have lots of holes and tendrils:

So what happens if we try to grow clusters in rings? Here are all the possible ways to “surround” the first of these polyominoes with a “single ring”:

And it turns out in every single case, there are edges (indicated here in red) where the cluster can’t be extended—thereby demonstrating that no infinite tiling is possible with this particular polyomino.

By the way, much like we saw with constrained growth on a grid, it’s possible to have “tiling regions” that can extend only a certain limited distance, then always get stuck.

It’s worth mentioning that we’ve considered here the case of single polyominoes. It’s also possible to consider being able to add a whole set of possible polyominoes—“Tetris style”.

Nonperiodic Tilings

We’ve looked at polyominoes—and shapes like pentagons—that don’t tile the plane. But what about shapes that can tile the plane, but only nonperiodically? As an example, let’s consider Penrose tiles. The basic shapes of these tiles are

though there are additional matching conditions (implicitly indicated by the arrows on each tile), which can be enforced either by putting notches in the tiles or by decorating the tiles:

Starting with these individual tiles, we can build up a multiway system by attaching tiles wherever the matching rules are satisfied (note that all edges of both tiles are the same length):

So how can we tell that these tiles can form a nonperiodic tiling? One approach is to generate a multiway system in which at successive steps we surround clusters with rings in all possible ways:

Continuing for another step we get:

Notice that here some of the branches have died out. But the question is what branches exist that will continue forever, and thus lead to an infinite tiling? To answer this we have to do a bit of analysis.

The first step is to see what possible “rings” can have formed around the original tile. And we can read all of these off from the multiway graph:

But now it’s convenient to look not at possible rings around a tile, but instead at possible configurations of tiles that can surround a single vertex. There turns out to be the following limited set:

The last two of these configurations have the feature that they can’t be extended: no tile can be added on the center of their “blue sides”. But it turns out that all the other configurations can be extended—though only to make a nested tiling, not a periodic one.

And a first indication of this is that larger copies of tiles (“supertiles”) can be drawn on top of the first three configurations we just identified, in such a way that the vertices of the supertiles coincide with vertices of the original tiles:

And now we can use this to construct rules for a substitution system:

Applying this substitution system builds up a nested tiling that can be continued forever:

But is such a nested tiling the only one that is possible with our original tiles? We can prove that it is by showing that every tile in every possible configuration occurs within a supertile. We can pull out possible configurations from the multiway system—and then in each case it turns out that we can indeed find a supertile in which the original tile occurs:

And what this all means is that the only infinite paths that can occur in the multiway system are ones that correspond to nested tilings; all other paths must eventually die out.

The Penrose tiling involves two distinct tiles. But in 2022 it was discovered that—if one’s allowed to flip the tile over—just a single (“hat”) tile is sufficient to force a nonperiodic tiling:

The full multiway graph obtained from this tile (and its flip-over) is complicated, but many paths in it lead (at least eventually) to “dead ends” which cannot be further extended. Thus, for example, the following configurations—which appear early in the multiway graph—all have the property that they can’t occur in an infinite tiling:

In the first case here, we can successively add a few rings of tiles:

But after 7 rings, there is a “contradiction” on the boundary, and no further growth is possible (as indicated by the red annotations):

Having eliminated cases that always lead to “dead ends” the resulting simplified multiway graph effectively includes all joins between hat tiles that can ultimately lead to surviving configurations:

Once again we can define a supertile transformation

where the region outlined in red can potentially overlap another supertile. Now we can construct a multiway graph for the supertile (in its “bitten out” and full variant)

and can see that there is a (one-to-one) map from the multiway graph for the original tiles and for these supertiles:

And now from this we can tell that there can be arbitrarily large nested tilings using the hat tile:

Personal Notes

Tucked away on page 979 of my 2002 book A New Kind of Science is a note (written in 1995) on “Generalized aggregation models”:

And in many ways the current piece is a three-decade-later followup to that note—using a new approach based on multiway systems.

In A New Kind of Science I did discuss multiway systems (both abstractly, and in connection with fundamental physics). But what I said about aggregation was mostly in a section called “The Phenomenon of Continuity” which discussed how randomness could on a large scale lead to apparent continuity. That section began by talking about things like random walks, but went on to discuss the same minimal (“Eden model”) example of “random aggregation” that I give here. And then, in an attempt to “spruce up” my discussion of aggregation, I started looking at “aggregation with constraints”. In the main text of the book I gave just two examples:

But then for the footnote I studied a wider range of constraints (enumerating them much as I had cellular automata)—and noticed the surprising phenomenon that with some constraints the aggregation process could end up getting stuck, and not being able to continue.

For years I carried around the idea of investigating that phenomenon further. And it was often on my list as a possible project for a student to explore at the Wolfram Summer School. Occasionally it was picked, and progress was made in various directions. And then a few years ago, with our Physics Project in the offing, the idea arose of investigating it using multiway systems—and there were Summer School projects that made progress on this. Meanwhile, as our Physics Project progressed, our tools for working with multiway systems greatly improved—ultimately making possible what we’ve done here.

By the way, back in the 1990s, one of the many topics I studied for A New Kind of Science was tilings. And in an effort to determine what tilings were possible, I investigated what amounts to aggregation under tiling constraints—which is in fact even a generalization of what I consider here:

Thanks

First and foremost, I’d like to thank Brad Klee for extensive help with this piece, as well as Nik Murzin for additional help. (Thanks also to Catherine Wolfram, Christopher Wolfram and Ed Pegg for specific pointers.) I’d like to thank various Wolfram Summer School students (and their mentors) who’ve worked on aggregation systems and their multiway interpretation in recent years: Kabir Khanna 2019 (mentors: Christopher Wolfram & Jonathan Gorard), Lina M. Ruiz 2021 (mentors: Jesse Galef & Xerxes Arsiwalla), Pietro Pepe 2023 (mentor: Bob Nachbar). (Also related are the Summer School projects on tilings by Bowen Ping 2023 and Johannes Martin 2023.)

How to Think Computationally about AI, the Universe and Everything

Stephen Wolfram — Fri, 27 Oct 2023 19:47:41 +0000

Transcript of a talk at TED AI on October 17, 2023, in San Francisco

Human language. Mathematics. Logic. These are all ways to formalize the world. And in our century there’s a new and yet more powerful one: computation.

And for nearly 50 years I’ve had the great privilege of building an ever taller tower of science and technology based on that idea of computation. And today I want to tell you some of what that’s led to.

There’s a lot to talk about—so I’m going to go quickly… sometimes with just a sentence summarizing what I’ve written a whole book about.

You know, I last gave a TED talk thirteen years ago—in February 2010—soon after Wolfram|Alpha launched.

And I ended that talk with a question: is computation ultimately what’s underneath everything in our universe?

I gave myself a decade to find out. And actually it could have needed a century. But in April 2020—just after the decade mark—we were thrilled to be able to announce what seems to be the ultimate “machine code” of the universe.

And, yes, it’s computational. So computation isn’t just a possible formalization; it’s the ultimate one for our universe.

It all starts from the idea that space—like matter—is made of discrete elements. And that the structure of space and everything in it is just defined by the network of relations between these elements—that we might call atoms of space. It’s very elegant—but deeply abstract.

But here’s a humanized representation:

A version of the very beginning of the universe. And what we’re seeing here is the emergence of space and everything in it by the successive application of very simple computational rules. And, remember, those dots are not atoms in any existing space. They’re atoms of space—that are getting put together to make space. And, yes, if we kept going long enough, we could build our whole universe this way.

Eons later here’s a chunk of space with two little black holes, that eventually merge, radiating ripples of gravitational radiation:

And remember—all this is built from pure computation. But like fluid mechanics emerging from molecules, what emerges here is spacetime—and Einstein’s equations for gravity. Though there are deviations that we just might be able to detect. Like that the dimensionality of space won’t always be precisely 3.

And there’s something else. Our computational rules can inevitably be applied in many ways, each defining a different thread of time—a different path of history—that can branch and merge:

But as observers embedded in this universe, we’re branching and merging too. And it turns out that quantum mechanics emerges as the story of how branching minds perceive a branching universe.

The little pink lines here show the structure of what we call branchial space—the space of quantum branches. And one of the stunningly beautiful things—at least for a physicist like me—is that the same phenomenon that in physical space gives us gravity, in branchial space gives us quantum mechanics.

In the history of science so far, I think we can identify four broad paradigms for making models of the world—that can be distinguished by how they deal with time.

In antiquity—and in plenty of areas of science even today—it’s all about “what things are made of”, and time doesn’t really enter. But in the 1600s came the idea of modeling things with mathematical formulas—in which time enters, but basically just as a coordinate value.

Then in the 1980s—and this is something in which I was deeply involved—came the idea of making models by starting with simple computational rules and then just letting them run:

Can one predict what will happen? No, there’s what I call computational irreducibility: in effect the passage of time corresponds to an irreducible computation that we have to run to know how it will turn out.

But now there’s something even more: in our Physics Project things become multicomputational, with many threads of time, that can only be knitted together by an observer.

It’s a new paradigm—that actually seems to unlock things not only in fundamental physics, but also in the foundations of mathematics and computer science, and possibly in areas like biology and economics too.

You know, I talked about building up the universe by repeatedly applying a computational rule. But how is that rule picked? Well, actually, it isn’t. Because all possible rules are used. And we’re building up what I call the ruliad: the deeply abstract but unique object that is the entangled limit of all possible computational processes. Here’s a tiny fragment of it shown in terms of Turing machines:

OK, so the ruliad is everything. And we as observers are necessarily part of it. In the ruliad as a whole, everything computationally possible can happen. But observers like us can just sample specific slices of the ruliad.

And there are two crucial facts about us. First, we’re computationally bounded—our minds are limited. And second, we believe we’re persistent in time—even though we’re made of different atoms of space at every moment.

So then here’s the big result. What observers with those characteristics perceive in the ruliad necessarily follows certain laws. And those laws turn out to be precisely the three key theories of 20th-century physics: general relativity, quantum mechanics, and statistical mechanics and the Second Law.

It’s because we’re observers like us that we perceive the laws of physics we do.

We can think of different minds as being at different places in rulial space. Human minds who think alike are nearby. Animals further away. And further out we get to alien minds where it’s hard to make a translation.

How can we get intuition for all this? We can use generative AI to take what amounts to an incredibly tiny slice of the ruliad—aligned with images we humans have produced.

We can think of this as a place in the ruliad described using the concept of a cat in a party hat:

Zooming out, we see what we might call “cat island”. But pretty soon we’re in interconcept space. Occasionally things will look familiar, but mostly we’ll see things we humans don’t have words for.

In physical space we explore more of the universe by sending out spacecraft. In rulial space we explore more by expanding our concepts and our paradigms.

We can get a sense of what’s out there by sampling possible rules—doing what I call ruliology:

Even with incredibly simple rules there’s incredible richness. But the issue is that most of it doesn’t yet connect with things we humans understand or care about. It’s like when we look at the natural world and only gradually realize we can use features of it for technology. Even after everything our civilization has achieved, we’re just at the very, very beginning of exploring rulial space.

But what about AIs? Just like we can do ruliology, AIs can in principle go out and explore rulial space. But left to their own devices, they’ll mostly be doing things we humans don’t connect with, or care about.

The big achievements of AI in recent times have been about making systems that are closely aligned with us humans. We train LLMs on billions of webpages so they can produce text that’s typical of what we humans write. And, yes, the fact that this works is undoubtedly telling us some deep scientific things about the semantic grammar of language—and generalizations of things like logic—that perhaps we should have known centuries ago.

You know, for much of human history we were kind of like LLMs, figuring things out by matching patterns in our minds. But then came more systematic formalization—and eventually computation. And with that we got a whole other level of power—to create truly new things, and in effect to go wherever we want in the ruliad.

But the challenge is to do that in a way that connects with what we humans—and our AIs—understand.

And in fact I’ve devoted a large part of my life to building that bridge. It’s all been about creating a language for expressing ourselves computationally: a language for computational thinking.

The goal is to formalize what we know about the world—in computational terms. To have computational ways to represent cities and chemicals and movies and formulas—and our knowledge about them.

It’s been a vast undertaking—that’s spanned more than four decades of my life. It’s something very unique and different. But I’m happy to report that in what has been Mathematica and is now the Wolfram Language I think we have now firmly succeeded in creating a truly full-scale computational language.

In effect, every one of the functions here can be thought of as formalizing—and encapsulating in computational terms—some facet of the intellectual achievements of our civilization:

It’s the most concentrated form of intellectual expression I know: finding the essence of everything and coherently expressing it in the design of our computational language. For me personally it’s been an amazing journey, year after year building the tower of ideas and technology that’s needed—and nowadays sharing that process with the world on open livestreams.

A few centuries ago the development of mathematical notation, and what amounts to the “language of mathematics”, gave a systematic way to express math—and made possible algebra, and calculus, and ultimately all of modern mathematical science. And computational language now provides a similar path—letting us ultimately create a “computational X” for all imaginable fields X.

We’ve seen the growth of computer science—CS. But computational language opens up something ultimately much bigger and broader: CX. For 70 years we’ve had programming languages—which are about telling computers in their terms what to do. But computational language is about something intellectually much bigger: it’s about taking everything we can think about and operationalizing it in computational terms.

You know, I built the Wolfram Language first and foremost because I wanted to use it myself. And now when I use it, I feel like it’s giving me a superpower:

I just have to imagine something in computational terms and then the language almost magically lets me bring it into reality, see its consequences and then build on them. And, yes, that’s the superpower that’s let me do things like our Physics Project.

And over the past 35 years it’s been my great privilege to share this superpower with many other people—and by doing so to have enabled such an incredible number of advances across so many fields. It’s a wonderful thing to see people—researchers, CEOs, kids—using our language to fluently think in computational terms, crispening up their own thinking and then in effect automatically calling in computational superpowers.

And now it’s not just people who can do that. AIs can use our computational language as a tool too. Yes, to get their facts straight, but even more importantly, to compute new facts. There are already some integrations of our technology into LLMs—and there’s a lot more you’ll be seeing soon. And, you know, when it comes to building new things, a very powerful emerging workflow is basically to start by telling the LLM roughly what you want, then have it try to express that in precise Wolfram Language. Then—and this is a critical feature of our computational language compared to a programming language—you as a human can “read the code”. And if it does what you want, you can use it as a dependable component to build on.

OK, but let’s say we use more and more AI—and more and more computation. What’s the world going to be like? From the Industrial Revolution on, we’ve been used to doing engineering where we can in effect “see how the gears mesh” to “understand” how things work. But computational irreducibility now shows that won’t always be possible. We won’t always be able to make a simple human—or, say, mathematical—narrative to explain or predict what a system will do.

And, yes, this is science in effect eating itself from the inside. From all the successes of mathematical science we’ve come to believe that somehow—if only we could find them—there’d be formulas to predict everything. But now computational irreducibility shows that isn’t true. And that in effect to find out what a system will do, we have to go through the same irreducible computational steps as the system itself.

Yes, it’s a weakness of science. But it’s also why the passage of time is significant—and meaningful. We can’t just jump ahead and get the answer; we have to “live the steps”.

It’s going to be a great societal dilemma of the future. If we let our AIs achieve their full computational potential, they’ll have lots of computational irreducibility, and we won’t be able to predict what they’ll do. But if we put constraints on them to make them predictable, we’ll limit what they can do for us.

So what will it feel like if our world is full of computational irreducibility? Well, it’s really nothing new—because that’s the story with much of nature. And what’s happened there is that we’ve found ways to operate within nature—even though nature can still surprise us.

And so it will be with the AIs. We might give them a constitution, but there will always be consequences we can’t predict. Of course, even figuring out societally what we want from the AIs is hard. Maybe we need a promptocracy where people write prompts instead of just voting. But basically every control-the-outcome scheme seems full of both political philosophy and computational irreducibility gotchas.

You know, if we look at the whole arc of human history, the one thing that’s systematically changed is that more and more gets automated. And LLMs just gave us a dramatic and unexpected example of that. So does that mean that in the end we humans will have nothing to do? Well, if you look at history, what seems to happen is that when one thing gets automated away, it opens up lots of new things to do. And as economies develop, the pie chart of occupations seems to get more and more fragmented.

And now we’re back to the ruliad. Because at a foundational level what’s happening is that automation is opening up more directions to go in the ruliad. And there’s no abstract way to choose between them. It’s just a question of what we humans want—and it requires humans “doing work” to define that.

A society of AIs untethered by human input would effectively go off and explore the whole ruliad. But most of what they’d do would seem to us random and pointless. Much like now most of nature doesn’t seem like it’s “achieving a purpose”.

One used to imagine that to build things that are useful to us, we’d have to do it step by step. But AI and the whole phenomenon of computation tell us that really what we need is more just to define what we want. Then computation, AI, automation can make it happen.

And, yes, I think the key to defining in a clear way what we want is computational language. You know—even after 35 years—for many people the Wolfram Language is still an artifact from the future. If your job is to program it seems like a cheat: how come you can do in an hour what would usually take a week? But it can also be daunting, because having dashed off that one thing, you now have to conceptualize the next thing. Of course, it’s great for CEOs and CTOs and intellectual leaders who are ready to race onto the next thing. And indeed it’s impressively popular in that set.

In a sense, what’s happening is that Wolfram Language shifts from concentrating on mechanics to concentrating on conceptualization. And the key to that conceptualization is broad computational thinking. So how can one learn to do that? It’s not really a story of CS. It’s really a story of CX. And as a kind of education, it’s more like liberal arts than STEM. It’s part of a trend that when you automate technical execution, what becomes important is not figuring out how to do things—but what to do. And that’s more a story of broad knowledge and general thinking than any kind of narrow specialization.

You know, there’s an unexpected human-centeredness to all of this. We might have thought that with the advance of science and technology, the particulars of us humans would become ever less relevant. But we’ve discovered that that’s not true. And that in fact everything—even our physics—depends on how we humans happen to have sampled the ruliad.

Before our Physics Project we didn’t know if our universe really was computational. But now it’s pretty clear that it is. And from that we’re inexorably led to the ruliad—with all its vastness, so hugely greater than all the physical space in our universe.

So where will we go in the ruliad? Computational language is what lets us chart our path. It lets us humans define our goals and our journeys. And what’s amazing is that all the power and depth of what’s out there in the ruliad is accessible to everyone. One just has to learn to harness those computational superpowers. Which starts here. Our portal to the ruliad:

Expression Evaluation and Fundamental Physics

Stephen Wolfram — Fri, 29 Sep 2023 21:48:31 +0000

An Unexpected Correspondence

Enter any expression and it’ll get evaluated:

And internally—say in the Wolfram Language—what’s going on is that the expression is progressively being transformed using all available rules until no more rules apply. Here the process can be represented like this:

We can think of the yellow boxes in this picture as corresponding to “evaluation events” that transform one “state of the expression” (represented by a blue box) to another, eventually reaching the “fixed point” 12.

And so far this may all seem very simple. But actually there are many surprisingly complicated and deep issues and questions. For example, to what extent can the evaluation events be applied in different orders, or in parallel? Does one always get the same answer? What about non-terminating sequences of events? And so on.

I was first exposed to such issues more than 40 years ago—when I was working on the design of the evaluator for the SMP system that was the forerunner of Mathematica and the Wolfram Language. And back then I came up with pragmatic, practical solutions—many of which we still use today. But I was never satisfied with the whole conceptual framework. And I always thought that there should be a much more principled way to think about such things—that would likely lead to all sorts of important generalizations and optimizations.

Well, more than 40 years later I think we can finally now see how to do this. And it’s all based on ideas from our Physics Project—and on a fundamental correspondence between what’s happening at the lowest level in all physical processes and in expression evaluation. Our Physics Project implies that ultimately the universe evolves through a series of discrete events that transform the underlying structure of the universe (say, represented as a hypergraph)—just like evaluation events transform the underlying structure of an expression.

And given this correspondence, we can start applying ideas from physics—like ones about spacetime and quantum mechanics—to questions of expression evaluation. Some of what this will lead us to is deeply abstract. But some of it has immediate practical implications, notably for parallel, distributed, nondeterministic and quantum-style computing. And from seeing how things play out in the rather accessible and concrete area of expression evaluation, we’ll be able to develop more intuition about fundamental physics and about other areas (like metamathematics) where the ideas of our Physics Project can be applied.

Causal Graphs and Spacetime

The standard evaluator in the Wolfram Language applies evaluation events to an expression in a particular order. But typically multiple orders are possible; for the example above, there are three:

So what determines what orders are possible? There is ultimately just one constraint: the causal dependencies that exist between events. The key point is that a given event cannot happen unless all the inputs to it are available, i.e. have already been computed. So in the example here, the evaluation event cannot occur unless the one has already occurred. And we can summarize this by “drawing a causal edge” from the event to the one. Putting together all these “causal relations”, we can make a causal graph, which in the example here has the simple form (where we include a special “Big Bang” initial event to create the original expression that we’re evaluating):

What we see from this causal graph is that the events on the left must all follow each other, while the event on the right can happen “independently”. And this is where we can start making an analogy with physics. Imagine our events are laid out in spacetime. The events on the left are “timelike separated” from each other, because they are constrained to follow one after another, and so must in effect “happen at different times”. But what about the event on the right? We can think of this as being “spacelike separated” from the others, and happening at a “different place in space” asynchronously from the others.

As a quintessential example of a timelike chain of events, consider making the definition

and then generating the causal graph for the events associated with evaluating f[f[f[1]]] (i.e. Nest[f, 1, 3]):

A straightforward way to get spacelike events is just to “build in space” by giving an expression like f[1] + f[1] + f[1] that has parts that can effectively be thought of as being explicitly “laid out in different places”, like the cells in a cellular automaton:

But one of the major lessons of our Physics Project is that it’s possible for space to “emerge dynamically” from the evolution of a system (in that case, by successive rewriting of hypergraphs). And it turns out very much the same kind of thing can happen in expression evaluation, notably with recursively defined functions.

As a simple example, consider the standard definition of Fibonacci numbers:

With this definition, the causal graph for the evaluation of f[3] is then:

For f[5], dropping the “context” of each event, and showing only what changed, the graph is

while for f[8] the structure of the graph is:

So what is the significance of there being spacelike-separated parts in this graph? At a practical level, a consequence is that those parts correspond to subevaluations that can be done independently, for example in parallel. All the events (or subevaluations) in any timelike chain must be done in sequence. But spacelike-separated events (or subevaluations) don’t immediately have a particular relative order. The whole graph can be thought of as defining a partial ordering for all events—with the events forming a partially ordered set (poset). Our “timelike chains” then correspond to what are usually called chains in the poset. The antichains of the poset represent possible collections of events that can occur “simultaneously”.

And now there’s a deep analogy to physics. Because just like in the standard relativistic approach to spacetime, we can define a sequence of “spacelike surfaces” (or hypersurfaces in 3 + 1-dimensional spacetime) that correspond to possible successive “simultaneity surfaces” where events can consistently be done simultaneously. Put another way, any “foliation” of the causal graph defines a sequence of “time steps” in which particular collections of events occur—as in for example:

And just like in relativity theory, different foliations correspond to different choices of reference frames, or what amount to different choices of “space and time coordinates”. But at least in the examples we’ve seen so far, the “final result” from the evaluation is always the same, regardless of the foliation (or reference frame) we use—just as we expect when there is relativistic invariance.

As a slightly more complex—but ultimately very similar—example, consider the nestedly recursive function:

Now the causal graph for f[12] has the form

which again has both spacelike and timelike structure.

Foliations and the Definition of Time

Let’s go back to our first example above—the evaluation of (1 + (2 + 2)) + (3 + 4). As we saw above, the causal graph in this case is:

The standard Wolfram Language evaluator makes these events occur in the following order:

And by applying events in this order starting with the initial state, we can reconstruct the sequence of states that will be reached at each step by this particular evaluation process (where now we’ve highlighted in each state the part that’s going to be transformed at each step):

Here’s the standard evaluation order for the Fibonacci number f[3]:

And here’s the sequence of states generated from this sequence of events:

Any valid evaluation order has to eventually visit (i.e. apply) all the events in the causal graph. Here’s the path that’s traced out by the standard evaluation order on the causal graph for f[8]. As we’ll discuss later, this corresponds to a depth-first scan of the (directed) graph:

But let’s return now to our first example. We’ve seen the order of events used in the standard Wolfram Language evaluation process. But there are actually three different orders that are consistent with the causal relations defined by the causal graph (in the language of posets, each of these is a “total ordering”):

And for each of these orders we can reconstruct the sequence of states that would be generated:

Up to this point we’ve always assumed that we’re just applying one event at a time. But whenever we have spacelike-separated events, we can treat such events as “simultaneous”—and applied at the same point. And—just like in relativity theory—there are typically multiple possible choices of “simultaneity surfaces”. Each one corresponds to a certain foliation of our causal graph. And in the simple case we’re looking at here, there are only two possible (maximal) foliations:

From such foliations we can reconstruct possible total orderings of individual events just by enumerating possible permutations of events within each slice of the foliation (i.e. within each simultaneity surface). But we only really need a total ordering of events if we’re going to apply one event at a time. Yet the whole point is that we can view spacelike-separated events as being “simultaneous”. Or, in other words, we can view our system as “evolving in time”, with each “time step” corresponding to a successive slice in the foliation.

And with this setup, we can reconstruct states that exist at each time step—interspersed by updates that may involve several “simultaneous” (spacelike-separated) events. In the case of the two foliations above, the resulting sequences of (“reconstructed”) states and updates are respectively:

As a more complicated example, consider recursively evaluating the Fibonacci number f[3] as above. Now the possible (maximal) foliations are:

For each of these foliations we can then reconstruct an explicit “time series” of states, interspersed by “updates” involving varying numbers of events:

So where in all these is the standard evaluation order? Well, it’s not explicitly here—because it involves doing a single event at a time, while all the foliations here are “maximal” in the sense that they aggregate as many events as they can into each spacelike slice. But if we don’t impose this maximality constraint, are there foliations that in a sense “cover” the standard evaluation order? Without the maximality constraint, there turn out in the example we’re using to be not 10 but 1249 possible foliations. And there are 4 that “cover” the standard (“depth-first”) evaluation order (indicated by a dashed red line):

(Only the last foliation here, in which every “slice” is just a single event, can strictly reproduce the standard evaluation order, but the others are all still “consistent with it”.)

In the standard evaluation process, only a single event is ever done at a time. But what if instead one tries to do as many events as possible at a time? Well, that’s what our “maximal foliations” above are about. But one particularly notable case is what corresponds to a breadth-first scan of the causal graph. And this turns out to be covered by the very last maximal foliation we showed above.

How this works may not be immediately obvious from the picture. With our standard layout for the causal graph, the path corresponding to the breadth-first scan is:

But if we lay out the causal graph differently, the path takes on the much-more-obviously-breadth-first form:

And now using this layout for the various configurations of foliations above we get:

We can think of different layouts for the causal graph as defining different “coordinatizations of spacetime”. If the vertical direction is taken to be time, and the horizontal direction space, then different layouts in effect place events at different positions in time and space. And with the layout here, the last foliation above is “flat”, in the sense that successive slices of the foliation can be thought of as directly corresponding to successive “steps in time”.

In physics terms, different foliations correspond to different “reference frames”. And the “flat” foliation can be thought of as being like the cosmological rest frame, in which the observer is “at rest with respect to the universe”. In terms of states and events, we can also interpret this another way: we can say it’s the foliation in which in some sense the “largest possible number of events are being packed in at each step”. Or, more precisely, if at each step we scan from left to right, we’re doing every successive event that doesn’t overlap with events we’ve already done at this step:

And actually this also corresponds to what happens if, instead of using the built-in standard evaluator, we explicitly tell the Wolfram Language to repeatedly do replacements in expressions. To compare with what we’ve done above, we have to be a little careful in our definitions, using ⊕ and ⊖ as versions of + and – that have to get explicitly evaluated by other rules. But having done this, we get exactly the same sequence of “intermediate expressions” as in the flat (i.e. “breadth-first”) foliation above:

In general, different foliations can be thought of as specifying different “event-selection functions” to be applied to determine what events should occur at the next steps from any given state. At one extreme we can pick single-event-at-a-time event selection functions—and at the other extreme we can pick maximum-events-at-a-time event selection functions. In our Physics Project we have called the states obtained by applying maximal collections of events at a time “generational states”. And in effect these states represent the typical way we parse physical “spacetime”—in which we take in “all of space” at every successive moment of time. At a practical level the reason we do this is that the speed of light is somehow fast compared to the operation of our brains: if we look at our local surroundings (say the few hundred meters around us), light from these will reach us in a microsecond, while it takes our brains milliseconds to register what we’re seeing. And this makes it reasonable for us to think of there being an “instantaneous state of space” that we can perceive “all at once” at each particular “moment in time”.

But what’s the analog of this when it comes to expression evaluation? We’ll discuss this a little more later. But suffice it to say here that it depends on who or what the “observer” of the process of evaluation is supposed to be. If we’ve got different elements of our states laid out explicitly in arrays, say in a GPU, then we might again “perceive all of space at once”. But if, for example, the data associated with states is connected through chains of pointers in memory or the like, and we “observe” this data only when we explicitly follow these pointers, then our perception won’t as obviously involve something we can think of as “bulk space”. But by thinking in terms of foliations (or reference frames) as we have here, we can potentially fit what’s going on into something like space, that seems familiar to us. Or, put another way, we can imagine in effect “programming in a certain reference frame” in which we can aggregate multiple elements of what’s going on into something we can consider as an analog of space—thereby making it familiar enough for us to understand and reason about.

Multiway Evaluation and Multiway Graphs

We can view everything we’ve done so far as dissecting and reorganizing the standard evaluation process. But let’s say we’re just given certain underlying rules for transforming expressions—and then we apply them in all possible ways. It’ll give us a “multiway” generalization of evaluation—in which instead of there being just one path of history, there are many. And in our Physics Project, this is exactly how the transition from classical to quantum physics works. And as we proceed here, we’ll see a close correspondence between multiway evaluation and quantum processes.

But let’s start again with our expression (1 + (2 + 2)) + (3 + 4), and consider all possible ways that individual integer addition “events” can be applied to evaluate this expression. In this particular case, the result is pretty simple, and can be represented by a tree that branches in just two places:

But one thing to notice here is that even at the first step there’s an event that we’ve never seen before. It’s something that’s possible if we apply integer addition in all possible places. But when we start from the standard evaluation process, the basic event just never appears with the “expression context” we’re seeing it in here.

Each branch in the tree above in some sense represents a different “path of history”. But there’s a certain redundancy in having all these separate paths—because there are multiple instances of the same expression that appear in different places. And if we treat these as equivalent and merge them we now get:

(The question of “state equivalence” is a subtle one, that ultimately depends on the operation of the observer, and how the observer constructs their perception of what’s going on. But for our purposes here, we’ll treat expressions as equivalent if they are structurally the same, i.e. every instance of or of 5 is “the same” or 5.)

If we now look only at states (i.e. expressions) we’ll get a multiway graph, of the kind that’s appeared in our Physics Project and in many applications of concepts from it:

This graph in a sense gives a succinct summary of possible paths of history, which here correspond to possible evaluation paths. The standard evaluation process corresponds to a particular path in this multiway graph:

What about a more complicated case? For example, what is the multiway graph for our recursive computation of Fibonacci numbers? As we’ll discuss at more length below, in order to make sure every branch of our recursive evaluation terminates, we have to give a slightly more careful definition of our function f:

But now here’s the multiway tree for the evaluation of f[2]:

And here’s the corresponding multiway graph:

The leftmost branch in the multiway tree corresponds to the standard evaluation process; here’s the corresponding path in the multiway graph:

Here’s the structure of the multiway graph for the evaluation of f[3]:

Note that (as we’ll discuss more later) all the possible evaluation paths in this case lead to the same final expression, and in fact in this particular example all the paths are of the same length (12 steps, i.e. 12 evaluation events).

In the multiway graphs we’re drawing here, every edge in effect corresponds to an evaluation event. And we can imagine setting up foliations in the multiway graph that divide these events into slices. But what is the significance of these slices? When we did the same kind of thing above for causal graphs, we could interpret the slices as representing “instantaneous states laid out in space”. And by analogy we can interpret a slice in the multiway graph as representing “instantaneous states laid out across branches of history”. In the context of our Physics Project, we can then think of these slices as being like superpositions in quantum mechanics, or states “laid out in branchial space”. And, as we’ll discuss later, just as we can think of elements laid out in “space” as corresponding in the Wolfram Language to parts in a symbolic expression (like a list, a sum, etc.), so now we’re dealing with a new kind of way of aggregating states across branchial space, that has to be represented with new language constructs.

But let’s return to the very simple case of (1 + (2 + 2)) + (3 + 4). Here’s a more complete representation of the multiway evaluation process in this case, including both all the events involved, and the causal relations between them:

The “single-way” evaluation process we discussed above uses only part of this:

And from this part we can pull out the causal relations between events to reproduce the (“single-way”) causal graph we had before. But what if we pull out all the causal relations in our full graph?

What we then have is the multiway causal graph. And from foliations of this, we can construct possible histories—though now they’re multiway histories, with the states at particular time steps now being what amount to superposition states.

In the particular case we’re showing here, the multiway causal graph has a very simple structure, consisting essentially just of a bunch of isomorphic pieces. And as we’ll see later, this is an inevitable consequence of the nature of the evaluation we’re doing here, and its property of causal invariance (and in this case, confluence).

Branchlike Separation

Although what we’ve discussed has already been somewhat complicated, there’s actually been a crucial simplifying assumption in everything we’ve done. We’ve assumed that different transformations on a given expression can never apply to the same part of the expression. Different transformations can apply to different parts of the same expression (corresponding to spacelike-separated evaluation events). But there’s never been a “conflict” between transformations, where multiple transformations can apply to the same part of the same expression.

So what happens if we relax this assumption? In effect it means that we can generate different “incompatible” branches of history—and we can characterize the events that produce this as “branchlike separated”. And when such branchlike-separated events are applied to a given state, they’ll produce multiple states which we can characterize as “separated in branchial space”, but nevertheless correlated as a result of their “common ancestry”—or, in quantum mechanics terms, “entangled”.

As a very simple first example, consider the rather trivial function f defined by

If we evaluate f[f[0]] (for any f) there are immediately two “conflicting” branches: one associated with evaluation of the “outer f”, and one with evaluation of the “inner f”:

We can indicate branchlike-separated pairs of events by a dashed line:

Adding in causal edges, and merging equivalent states, we get:

We see that some events are causally related. The first two events are not—but given that they involve overlapping transformations they are “branchially related” (or, in effect, entangled).

Evaluating the expression f[f[0]+1] gives a more complicated graph, with two different instances of branchlike-separated events:

Extracting the multiway states graph we get

where now we have indicated “branchially connected” states by pink “branchial edges”. Pulling out only these branchial edges then gives the (rather trivial) branchial graph for this evaluation process:

There are many subtle things going on here, particularly related to the treelike structure of expressions. We’ve talked about separations between events: timelike, spacelike and branchlike. But what about separations between elements of an expression? In something like {f[0], f[0], f[0]} it’s reasonable to extend our characterization of separations between events, and say that the f[0]’s in the expression can themselves be considered spacelike separated. But what about in something like f[f[0]]? We can say that the f[_]’s here “overlap”—and “conflict” when they are transformed—making them branchlike separated. But the structure of the expression also inevitably makes them “treelike separated”. We’ll see later how to think about the relation between treelike-separated elements in more fundamental terms, ultimately using hypergraphs. But for now an obvious question is what in general the relation between branchlike-separated elements can be.

And essentially the answer is that branchlike separation has to “come with” some other form of separation: spacelike, treelike, rulelike, etc. Rulelike separation involves having multiple rules for the same object (e.g. a rule as well as )—and we’ll talk about this later. With spacelike separation, we basically get branchlike separation when subexpressions “overlap”. This is fairly subtle for tree-structured expressions, but is much more straightforward for strings, and indeed we have discussed this case extensively in connection with our Physics Project.

Consider the (rather trivial) string rewriting rule:

Applying this rule to AAAAAA we get:

Some of the events here are purely spacelike separated, but whenever the characters they involve overlap, they are also branchlike separated (as indicated by the dashed pink lines). Extracting the multiway states graph we get:

And now we get the following branchial graph:

So how can we see analogs in expression evaluation? It turns out that combinators provide a good example (and, yes, it’s quite remarkable that we’re using combinators here to help explain something—given that combinators almost always seem like the most obscure and difficult-to-explain things around). Define the standard S and K combinators:

Now we have for example

where there are many spacelike-separated events, and a single pair of branchlike + treelike-separated ones. With a slightly more complicated initial expression, we get the rather messy result

now with many branchlike-separated states:

Rather than using the full standard S, K combinators, we can consider a simpler combinator definition:

Now we have for example

where the branchial graph is

and the multiway causal graph is:

The expression f[f[f][f]][f] gives a more complicated multiway graph

and branchial graph:

Interpretations, Analogies and the Concept of Multi

Before we started talking about branchlike separation, the only kinds of separation we considered were timelike and spacelike. And in this case we were able to take the causal graphs we got, and set up foliations of them where each slice could be thought of as representing a sequential step in time. In effect, what we were doing was to aggregate things so that we could talk about what happens in “all of space” at a particular time.

But when there’s branchlike separation we can no longer do this. Because now there isn’t a single, consistent “configuration of all of space” that can be thought of as evolving in a single thread through time. Rather, there are “multiple threads of history” that wind their way through the branchings (and mergings) that occur in the multiway graph. One can make foliations in the multiway graph—much like one does in the causal graph. (More strictly, one really needs to make the foliations in the multiway causal graph—but these can be “inherited” by the multiway graph.)

In physics terms, the (single-way) causal graph can be thought of as a discrete version of ordinary spacetime—with a foliation of it specifying a “reference frame” that leads to a particular identification of what one considers space, and what time. But what about the multiway causal graph? In effect, we can imagine that it defines a new, branchial “direction”, in addition to the spatial direction. Projecting in this branchial direction, we can then think of getting a kind of branchial analog of spacetime that we can call branchtime. And when we construct the multiway graph, we can basically imagine that it’s a representation of branchtime.

A particular slice of a foliation of the (single-way) causal graph can be thought of as corresponding to an “instantaneous state of (ordinary) space”. So what does a slice in a foliation of the multiway graph represent? It’s effectively a branchial or multiway combination of states—a collection of states that can somehow all exist “at the same time”. And in physics terms we can interpret it as a quantum superposition of states.

But how does all this work in the context of expressions? The parts of a single expression like a + b + c + d or {a, b, c, d} can be thought of being spacelike separated, or in effect “laid out in space”. But what kind of a thing has parts that are “laid out in branchial space”? It’s a new kind of fundamentally multiway construct. We’re not going to explore it too much here, but in the Wolfram Language we might in future call it Multi. And just as {a, b, c, d} (or List[a, b, c, d]) can be thought of as representing a, b, c, d “laid out in space”, so now Multi[a, b, c, d] would represent a, b, c, d “laid out branchial space”.

In ordinary evaluation, we just generate a specific sequence of individual expressions. But in multiway evaluation, we can imagine that we generate a sequence of Multi objects. In the examples we’ve seen so far, we always eventually get a Multi containing just a single expression. But we’ll soon find out that that’s not always how things work, and we can perfectly well end up with a Multi containing multiple expressions.

So what might we do with a Multi? In a typical “nondeterministic computation” we probably want to ask: “Does the Multi contain some particular expression or pattern that we’re looking for?” If we imagine that we’re doing a “probabilistic computation” we might want to ask about the frequencies of different kinds of expressions in the Multi. And if we’re doing quantum computation with the normal formalism of quantum mechanics, we might want to tag the elements of the Multi with “quantum amplitudes” (that, yes, in our model presumably have magnitudes determined by path counting in the multiway graph, and phases representing the “positions of elements in branchial space”). And in a traditional quantum measurement, the concept would typically be to determine a projection of a Multi, or in effect an inner product of Multi objects. (And, yes, if one knows only that projection, it’s not going to be enough to let one unambiguously continue the “multiway computation”; the quantum state has in effect been “collapsed”.)

Is There Always a Definite Result?

For an expression like (1 + (2 + 2)) + (3 + 4) it doesn’t matter in what order one evaluates things; one always gets the same result—so that the corresponding multiway graph leads to just a single final state:

But it’s not always true that there’s a single final state. For example, with the definitions

standard evaluation in the Wolfram Language gives the result 0 for f[f[0]] but the full multiway graph shows that (with a different evaluation order) it’s possible instead to get the result g[g[0]]:

And in general when a certain collection of rules (or definitions) always leads to just a single result, one says that the collection of rules is confluent; otherwise it’s not. Pure arithmetic turns out to be confluent. But there are plenty of examples (e.g. in string rewriting) that are not. Ultimately a failure of confluence must come from the presence of branchlike separation—or in effect a conflict between behavior on two different branches. And so in the example above we see that there are branchlike-separated “conflicting” events that never resolve—yielding two different final outcomes:

As an even simpler example, consider the definitions and . In the Wolfram Language these definitions immediately overwrite each other. But assume they could both be applied (say through explicit , rules). Then there’s a multiway graph with two “unresolved” branches—and two outcomes:

For string rewriting systems, it’s easy to enumerate possible rules. The rule

(that effectively sorts the elements in the string) is confluent:

But the rule

is not confluent

and “evaluates” BABABA to four distinct outcomes:

These are all cases where “internal conflicts” lead to multiple different final results. But another way to get different results is through “side effects”. Consider first setting x = 0 then evaluating {x = 1, x + 1}:

If the order of evaluation is such that x + 1 is evaluated before x = 1 it will give 1, otherwise it will give 2, leading to the two different outcomes {1, 1} and {1, 2}. In some ways this is like the example above where we had two distinct rules: and . But there’s a difference. While explicit rules are essentially applied only “instantaneously”, an assignment like x = 1 has a “permanent” effect, at least until it is “overwritten” by another assignment. In an evaluation graph like the one above we’re showing particular expressions generated during the evaluation process. But when there are assignments, there’s an additional “hidden state” that in the Wolfram Language one can think of as corresponding to the state of the global symbol table. If we included this, then we’d again see rules that apply “instantaneously”, and we’d be able to explicitly trace causal dependencies between events. But if we elide it, then we effectively hide the causal dependence that’s “carried” by the state of the symbol table, and the evaluation graphs we’ve been drawing are necessarily somewhat incomplete.

Computations That Never End

The basic operation of the Wolfram Language evaluator is to keep doing transformations until the result no longer changes (or, in other words, until a fixed point is reached). And that’s convenient for being able to “get a definite answer”. But it’s rather different from what one usually imagines happens in physics. Because in that case we’re typically dealing with things that just “keep progressing through time”, without ever getting to any fixed point. (“Spacetime singularities”, say in black holes, do for example involve reaching fixed points where “time has come to an end”.)

But what happens in the Wolfram Language if we just type , without giving any value to ? The Wolfram Language evaluator will keep evaluating this, trying to reach a fixed point. But it’ll never get there. And in practice it’ll give a message, and (at least in Version 13.3 and above) return a TerminatedEvaluation object:

What’s going on inside here? If we look at the evaluation graph, we can see that it involves an infinite chain of evaluation events, that progressively “extrude” +1’s:

A slightly simpler case (that doesn’t raise questions about the evaluation of Plus) is to consider the definition

which has the effect of generating an infinite chain of progressively more “f-nested” expressions:

Let’s say we define two functions:

Now we don’t just get a simple chain of results; instead we get an exponentially growing multiway graph:

In general, whenever we have a recursive definition (say of f in terms of f or x in terms of x) there’s the possibility of an infinite process of evaluation, with no “final fixed point”. There are of course specific cases of recursive definitions that always terminate—like the Fibonacci example we gave above. And indeed when we’re dealing with so-called “primitive recursion” this is how things inevitably work: we’re always “systematically counting down” to some defined base case (say f[1] = 1).

When we look at string rewriting (or, for that matter, hypergraph rewriting), evolution that doesn’t terminate is quite ubiquitous. And in direct analogy with, for example, the string rewriting rule ABBB, BBA we can set up the definitions

and then the (infinite) multiway graph begins:

One might think that the possibility of evaluation processes that don’t terminate would be a fundamental problem for a system set up like the Wolfram Language. But it turns out that in current normal usage one basically never runs into the issue except by mistake, when there’s a bug in one’s program.

Still, if one explicitly wants to generate an infinite evaluation structure, it’s not hard to do so. Beyond one can define

and then one gets the multiway graph

which has CatalanNumber[t] (or asymptotically ~4^t) states at layer t.

Another “common bug” form of non-terminating evaluation arises when one makes a primitive-recursion-style definition without giving a “boundary condition”. Here, for example, is the Fibonacci recursion without f[0] and f[1] defined:

And in this case the multiway graph is infinite

with ~2^t states at layer t.

But consider now the “unterminated factorial recursion”

On its own, this just leads to a single infinite chain of evaluation

but if we add the explicit rule that multiplying anything by zero gives zero (i.e. 0 _ → 0) then we get

in which there’s a “zero sink” in addition to an infinite chain of f[–n] evaluations.

Some definitions have the property that they provably always terminate, though it may take a while. An example is the combinator definition we made above:

Here’s the multiway graph starting with f[f[f][f]][f], and terminating in at most 10 steps:

Starting with f[f[f][f][f][f]][f] the multiway graph becomes

but again the evaluation always terminates (and gives a unique result). In this case we can see why this happens: at each step f[x_][y_] effectively “discards ”, thereby “fundamentally getting smaller”, even as it “puffs up” by making three copies of .

But if instead one uses the definition

things get more complicated. In some cases, the multiway evaluation always terminates

while in others, it never terminates:

But then there are cases where there is sometimes termination, and sometimes not:

In this particular case, what’s happening is that evaluation of the first argument of the “top-level f” never terminates, but if the top-level f is evaluated before its arguments then there’s immediate termination. Since the standard Wolfram Language evaluator evaluates arguments first (“leftmost-innermost evaluation”), it therefore won’t terminate in this case—even though there are branches in the multiway evaluation (corresponding to “outermost evaluation”) that do terminate.

Transfinite Evaluation

If a computation reaches a fixed point, we can reasonably say that that’s the “result” of the computation. But what if the computation goes on forever? Might there still be some “symbolic” way to represent what happens—that for example allows one to compare results from different infinite computations?

In the case of ordinary numbers, we know that we can define a “symbolic infinity” ∞ (Infinity in Wolfram Language) that represents an infinite number and has all the obvious basic arithmetic properties:

But what about infinite processes, or, more specifically, infinite multiway graphs? Is there some useful symbolic way to represent such things? Yes, they’re all “infinite”. But somehow we’d like to distinguish between infinite graphs of different forms, say:

And already for integers, it’s been known for more than a century that there’s a more detailed way to characterize infinities than just referring to them all as ∞: it’s to use the idea of transfinite numbers. And in our case we can imagine successively numbering the nodes in a multiway graph, and seeing what the largest number we reach is. For an infinite graph of the form

(obtained say from x = x + 1 or x = {x}) we can label the nodes with successive integers, and we can say that the “largest number reached” is the transfinite ordinal ω.

A graph consisting of two infinite chains is then characterized by 2ω, while an infinite 2D grid is characterized by ω², and an infinite binary tree is characterized by 2^ω.

What about larger numbers? To get to ω^ω we can use a rule like

that effectively yields a multiway graph that corresponds to a tree in which successive layers have progressively larger numbers of branches:

One can think of a definition like x = x + 1 as setting up a “self-referential data structure”, whose specification is finite (in this case essentially a loop), and where the infinite evaluation process arises only when one tries to get an explicit value out of the structure. More elaborate recursive definitions can’t, however, readily be thought of as setting up straightforward self-referential data structures. But they still seem able to be characterized by transfinite numbers.

In general many multiway graphs that differ in detail will be associated with a given transfinite number. But the expectation is that transfinite numbers can potentially provide robust characterizations of infinite evaluation processes, with different constructions of the “same evaluation” able to be identified as being associated with the same canonical transfinite number.

Most likely, definitions purely involving pattern matching won’t be able to generate infinite evaluations beyond ε₀ = ω^{ω^{ω^{.^{.^.}}}}—which is also the limit of where one can reach with proofs based on ordinary induction, Peano Arithmetic, etc. It’s perfectly possible to go further—but one needs to explicitly use functions like NestWhile etc. in the definitions that are given.

And there’s another issue as well: given a particular set of definitions, there’s no limit to how difficult it can be to determine the ultimate multiway graph that’ll be produced. In the end this is a consequence of computational irreducibility, and of the undecidability of the halting problem, etc. And what one can expect in the end is that some infinite evaluation processes one will be able to prove can be characterized by particular transfinite numbers, but others one won’t be able to “tie down” in this way—and in general, as computational irreducibility might suggest, won’t ever allow one to give a “finite symbolic summary”.

The Question of the Observer

One of the key lessons of our Physics Project is the importance of the character of the observer in determining what one “takes away” from a given underlying system. And in setting up the evaluation process—say in the Wolfram Language—the typical objective is to align with the way human observers expect to operate. And so, for example, one normally expects that one will give an expression as input, then in the end get an expression as output. The process of transforming input to output is analogous to the doing of a calculation, the answering of a question, the making of a decision, the forming of a response in human dialog, and potentially the forming of a thought in our minds. In all of these cases, we treat there as being a certain “static” output.

It’s very different from the way physics operates, because in physics “time always goes on”: there’s (essentially) always another step of computation to be done. In our usual description of evaluation, we talk about “reaching a fixed point”. But an alternative would be to say that we reach a state that just repeats unchanged forever—but we as observers equivalence all those repeats, and think of it as having reached a single, unchanging state.

Any modern practical computer also fundamentally works much more like physics: there are always computational operations going on—even though those operations may end up, say, continually putting the exact same pixel in the same place on the screen, so that we can “summarize” what’s going on by saying that we’ve reached a fixed point.

There’s much that can be done with computations that reach fixed points, or, equivalently with functions that return definite values. And in particular it’s straightforward to compose such computations or functions, continually taking output and then feeding it in as input. But there’s a whole world of other possibilities that open up once one can deal with infinite computations. As a practical matter, one can treat such computations “lazily”—representing them as purely symbolic objects from which one can derive particular results if one explicitly asks to do so.

One kind of result might be of the type typical in logic programming or automated theorem proving: given a potentially infinite computation, is it ever possible to reach a specified state (and, if so, what is the path to do so)? Another type of result might involve extracting a particular “time slice” (with some choice of foliation), and in general representing the result as a Multi. And still another type of result (reminiscent of “probabilistic programming”) might involve not giving an explicit Multi, but rather computing certain statistics about it.

And in a sense, each of these different kinds of results can be thought of as what’s extracted by a different kind of observer, who is making different kinds of equivalences.

We have a certain typical experience of the physical world that’s determined by features of us as observers. For example, as we mentioned above, we tend to think of “all of space” progressing “together” through successive moments of time. And the reason we think this is that the regions of space we typically see around us are small enough that the speed of light delivers information on them to us in a time that’s short compared to our “brain processing time”. If we were bigger or faster, then we wouldn’t be able to think of what’s happening in all of space as being “simultaneous” and we’d immediately be thrust into issues of relativity, reference frames, etc.

And in the case of expression evaluation, it’s very much the same kind of thing. If we have an expression laid out in computer memory (or across a network of computers), then there’ll be a certain time to “collect information spatially from across the expression”, and a certain time that can be attributed to each update event. And the essence of array programming (and much of the operation of GPUs) is that one can assume—like in the typical human experience of physical space—that “all of space” is being updated “together”.

But in our analysis above, we haven’t assumed this, and instead we’ve drawn causal graphs that explicitly trace dependencies between events, and show which events can be considered to be spacelike separated, so that they can be treated as “simultaneous”.

We’ve also seen branchlike separation. In the physics case, the assumption is that we as observers sample in an aggregated way across extended regions in branchial space—just as we do across extended regions in physical space. And indeed the expectation is that we encounter what we describe as “quantum effects” precisely because we are of limited extent in branchial space.

In the case of expression evaluation, we’re not used to being extended in branchial space. We typically imagine that we’ll follow some particular evaluation path (say, as defined by the standard Wolfram Language evaluator), and be oblivious to other paths. But, for example, strategies like speculative execution (typically applied at the hardware level) can be thought of as representing extension in branchial space.

And at a theoretical level, one certainly thinks of different kinds of “observations” in branchial space. In particular, there’s nondeterministic computation, in which one tries to identify a particular “thread of history” that reaches a given state, or a state with some property one wants.

One crucial feature of observers like us is that we are computationally bounded—which puts limitations on the kinds of observations we can make. And for example computational irreducibility then limits what we can immediately know (and aggregate) about the evolution of systems through time. And similarly multicomputational irreducibility limits what we can immediately know (and aggregate) about how systems behave across branchial space. And insofar as any computational devices we build in practice must be ones that we as observers can deal with, it’s inevitable that they’ll be subject to these kinds of limitations. (And, yes, in talking about quantum computers there tends to be an implicit assumption that we can in effect overcome multicomputational irreducibility, and “knit together” all the different computational paths of history—but it seems implausible that observers like us can actually do this, or can in general derive definite results without expending computationally irreducible effort.)

One further small comment about observers concerns what in physics are called closed timelike curves—essentially loops in time. Consider the definition:

This gives for example the multiway graph:

One can think of this as connecting the future to the past—something that’s sometimes interpreted as “allowing time travel”. But really this is just a more (time-)distributed version of a fixed point. In a fixed point, a single state is constantly repeated. Here a sequence of states (just two in the example given here) get visited repeatedly. The observer could treat these states as continually repeating in a cycle, or could coarse grain and conclude that “nothing perceptible is changing”.

In spacetime we think of observers as making particular choices of simultaneity surfaces—or in effect picking particular ways to “parse” the causal graph of events. In branchtime the analog of this is that observers pick how to parse the multiway graph. Or, put another way, observers get to choose a path through the multiway graph, corresponding to a particular evaluation order or evaluation scheme. In general, there is a tradeoff between the choices made by the observer, and the behavior generated by applying the rules of the system.

But if the observer is computationally bounded, they cannot overcome the computational irreducibility—or multicomputational irreducibility—of the behavior of the system. And as a result, if there is complexity in the detailed behavior of the system, the observer will not be able to avoid it at a detailed level by the choices they make. Though a critical idea of our Physics Project is that by appropriate aggregation, the observer will detect certain aggregate features of the system, that have robust characteristics independent of the underlying details. In physics, this represents a bulk theory suitable for the perception of the universe by observers like us. And presumably there is an analog of this in expression evaluation. But insofar as we’re only looking at the evaluation of expressions we’ve engineered for particular computational purposes, we’re not yet used to seeing “generic bulk expression evaluation”.

But this is exactly what we’ll see if we just go out and run “arbitrary programs”, say found by enumerating certain classes of programs (like combinators or multiway Turing machines). And for observers like us these will inevitably “seem very much like physics”.

The Tree Structure of Expressions

Although we haven’t talked about this so far, any expression fundamentally has a tree structure. So, for example, (1 + (2 + 2)) + (3 + 4) is represented—say internally in the Wolfram Language—as the tree:

So how does this tree structure interact with the process of evaluation? In practice it means for example that in the standard Wolfram Language evaluator there are two different kinds of recursion going on. The first is the progressive (“timelike”) reevaluation of subexpressions that change during evaluation. And the second is the (“spacelike” or “treelike”) scanning of the tree.

In what we’ve discussed above, we’ve focused on evaluation events and their relationships, and in doing so we’ve concentrated on the first kind of recursion—and indeed we’ve often elided some of the effects of the second kind by, for example, immediately showing the result of evaluating Plus[2, 2] without showing more details of how this happens.

But here now is a more complete representation of what’s going on in evaluating this simple expression:

The solid gray lines in this “trace graph” indicate the subparts of the expression tree at each step. The dashed gray lines indicate how these subparts are combined to make expressions. And the red lines indicate actual evaluation events where rules (either built in or specified by definitions) are applied to expressions.

It’s possible to read off things like causal dependence between events from the trace graph. But there’s a lot else going on. Much of it is at some level irrelevant—because it involves recursing into parts of the expression tree (like the head Plus) where no evaluation events occur. Removing these parts we then get an elided trace graph in which for example the causal dependence is clearer:

Here’s the trace graph for the evaluation of f[5] with the standard recursive Fibonacci definition

and here’s its elided form:

At least when we discussed single-way evaluation above, we mostly talked about timelike and spacelike relations between events. But with tree-structured expressions there are also treelike relations.

Consider the rather trivial definition

and look at the multiway graph for the evaluation of f[f[0]]:

What is the relation between the event on the left branch, and the top event on the right branch? We can think of them as being treelike separated. The event on the left branch transforms the whole expression tree. But the event on the right branch just transforms a subexpression.

Spacelike-separated events affect disjoint parts in an expression (i.e. ones on distinct branches of the expression tree). But treelike-separated events affect nested parts of an expression (i.e. ones that appear on a single branch in the expression tree). Inevitably, treelike-separated events also have a kind of one-way branchlike separation: if the “higher event” in the tree happens, the “lower one” cannot.

In terms of Wolfram Language part numbers, spacelike-separated events affect parts with disjoint numbers, say {2, 5} and {2, 8}. But treelike-separated events affect parts with overlapping sequences of part numbers, say {2} and {2, 5} or {2, 5} and {2, 5, 1}.

In our Physics Project there’s nothing quite like treelike relations built in. The “atoms of space” are related by a hypergraph—without any kind of explicit hierarchical structure. The hypergraph can take on what amounts to a hierarchical structure, but the fundamental transformation rules won’t intrinsically take account of this.

The hierarchical structure of expressions is incredibly important in their practical use—where it presumably leverages the hierarchical structure of human language, and of ways we talk about the world:

We’ll see soon below that we can in principle represent expressions without having hierarchical structure explicitly built in. But in almost all uses of expressions—say in Wolfram Language—we end up needing to have hierarchical structure.

If we were only doing single-way evaluation the hierarchical structure of expressions would be important in determining the order of evaluation to be used, but it wouldn’t immediately enmesh with core features of the evaluation process. But in multiway evaluation “higher” treelike-separated events can in effect cut off the evaluation histories of “lower” ones—and so it’s inevitably central to the evaluation process. For spacelike- and branchlike-separated events, we can always choose different reference frames (or different spacelike or branchlike surfaces) that arrange the events differently. But treelike-separated events—a little like timelike-separated ones—have a certain forced relationship that cannot be affected by an observer’s choices.

Grinding Everything Down to Hypergraphs

To draw causal graphs—and in fact to do a lot of what we’ve done here—we need to know “what depends on what”. And with our normal setup for expressions this can be quite subtle and complicated. We apply the rule to to give the result . But does the a that “comes out” depend on the a that went in, or is it somehow something that’s “independently generated”? Or, more extremely, in a transformation like , to what extent is it “the same 1” that goes in and comes out? And how do these issues of dependence work when there are the kinds of treelike relations discussed in the previous section?

The Wolfram Language evaluator defines how expressions should be evaluated—but doesn’t immediately specify anything about dependencies. Often we can look “after the fact” and deduce what “was involved” and what was not—and thus what should be considered to depend on what. But it’s not uncommon for it to be hard to know what to say—forcing one to make what seem likely arbitrary decisions. So is there any way to avoid this, and to set things up so that dependency becomes somehow “obvious”?

It turns out that there is—though, perhaps not surprisingly, it comes with difficulties of its own. But the basic idea is to go “below expressions”, and to “grind everything down” to hypergraphs whose nodes are ultimate direct “carriers” of identity and dependency. It’s all deeply reminiscent of our Physics Project—and its generalization in the ruliad. Though in those cases the individual elements (or “emes” as we call them) exist far below the level of human perception, while in the hypergraphs we construct for expressions, things like symbols and numbers appear directly as emes.

So how can we “compile” arbitrary expressions to hypergraphs? In the Wolfram Language something like a + b + c is the “full-form” expression

which corresponds to the tree:

And the point is that we can represent this tree by a hypergraph:

Plus, a, b and c appear directly as “content nodes” in the hypergraph. But there are also “infrastructure nodes” (here labeled with integers) that specify how the different pieces of content are “related”—here with a 5-fold hyperedge representing Plus with three arguments. We can write this hypergraph out in “symbolic form” as:

Let’s say instead we have the expression or Plus[a, Plus[b, c]], which corresponds to the tree:

We can represent this expression by the hypergraph

which can be rendered visually as:

What does evaluation do to such hypergraphs? Essentially it must transform collections of hyperedges into other collections of hyperedges. So, for example, when x_ + y_ is evaluated, it transforms a set of 3 hyperedges to a single hyperedge according to the rule:

(Here the list on the left-hand side represents three hyperedges in any order—and so is effectively assumed to be orderless.) In this rule, the literal Plus acts as a kind of key to determine what should happen, while the specific patterns define how the input and output expressions should be “knitted together”.

So now let’s apply this rule to the expression 10 + (20 + 30). The expression corresponds to the hypergraph

where, yes, there are integers both as content elements, and as labels or IDs for “infrastructure nodes”. The rule operates on collections of hyperedges, always consuming 3 hyperedges, and generating 1. We can think of the hyperedges as “fundamental tokens”. And now we can draw a token-event graph to represent the evaluation process:

Here’s the slightly more complicated case of (10 + (20 + 20)) + (30 + 40):

But here now is the critical point. By looking at whether there are emes in common from one event to another, we can determine whether there is dependency between those events. Emes are in a sense “atoms of existence” that maintain a definite identity, and immediately allow one to trace dependency.

So now we can fill in causal edges, with each edge labeled by the emes it “carries”:

Dropping the hyperedges, and adding in an initial “Big Bang” event, we get the (multiway) causal graph:

We should note that in the token-event graph, each expression has been “shattered” into its constituent hyperedges. Assembling the tokens into recognizable expressions effectively involves setting up a particular foliation of the token-event graph. But if we do this, we get a multiway graph expressed in terms of hypergraphs

or in visual form:

As a slightly more complicated case, consider the recursive computation of the Fibonacci number f[2]. Here is the token-event graph in this case:

And here is the corresponding multiway causal graph, labeled with the emes that “carry causality”:

Every kind of expression can be “ground down” in some way to hypergraphs. For strings, for example, it’s convenient to make a separate token out of every character, so that “ABBAAA” can be represented as:

It’s interesting to note that our hypergraph setup can have a certain similarity to machine-level representations of expressions, with every eme in effect corresponding to a pointer to a certain memory location. Thus, for example, in the representation of the string, the infrastructure emes define the pointer structure for a linked list—with the content emes being the “payloads” (and pointing to globally shared locations, like ones for A and B).

Transformations obtained by applying rules can then be thought of as corresponding just to rearranging pointers. Sometimes “new emes” have to be created, corresponding to new memory being allocated. We don’t have an explicit way to “free” memory. But sometimes some part of the hypergraph will become disconnected—and one can then imagine disconnected pieces to which the observer is not attached being garbage collected.

The Rulial Case

So far we’ve discussed what happens in the evaluation of particular expressions according to particular rules (where those rules could just be all the ones that are built into Wolfram Language). But the concept of the ruliad suggests thinking about all possible computations—or, in our terms here, all possible evaluations. Instead of particular expressions, we are led to think about evaluating all possible expressions. And we are also led to think about using all possible rules for these evaluations.

As one simple approach to this, instead of looking, for example, at a single combinator definition such as

used to evaluate a single expression such as

we can start enumerating all possible combinator rules

and apply them to evaluate all possible expressions:

Various new phenomena show up here. For example, there is now immediately the possibility of not just spacelike and branchlike separation, but also what we can call rulelike separation.

In a trivial case, we could have rules like

and then evaluating x will lead to two events which we can consider rulelike separated:

In the standard Wolfram Language system, the definitions and x = b would overwrite each other. But if we consider rulial multiway evaluation, we’d have branches for each of these definitions.

In what we’ve discussed before, we effectively allow evaluation to take infinite time, as well as infinite space and infinite branchial space. But now we’ve got the new concept of infinite rulial space. We might say from the outset that, for example, we’re going to use all possible rules. Or we might have what amounts to a dynamical process that generates possible rules.

And the key point is that as soon as that process is in effect computation universal, there is a way to translate from one instance of it to another. Different specific choices will lead to a different basis—but in the end they’ll all eventually generate the full ruliad.

And actually, this is where the whole concept of expression evaluation ultimately merges with fundamental physics. Because in both cases, the limit of what we’re doing will be exactly the same: the full ruliad.

The Practical Computing Story

The formalism we’ve discussed here—and particularly its correspondence with fundamental physics—is in many ways a new story. But it has precursors that go back more than a century. And indeed as soon as industrial processes—and production lines—began to be formalized, it became important to understand interdependencies between different parts of a process. By the 1920s flowcharts had been invented, and when digital computers were developed in the 1940s they began to be used to represent the “flow” of programs (and in fact Babbage had used something similar even in the 1840s). At first, at least as far as programming was concerned, it was all about the “flow of control”—and the sequence in which things should be done. But by the 1970s the notion of the “flow of data” was also widespread—in some ways reflecting back to actual flow of electrical signals. In some simple cases various forms of “visual programming”—typically based on connecting virtual wires—have been popular. And even in modern times, it’s not uncommon to talk about “computation graphs” as a way to specify how data should be routed in a computation, for example in sequences of operations on tensors (say for neural net applications).

A different tradition—originating in mathematics in the late 1800s—involved the routine use of “abstract functions” like f(x). Such abstract functions could be used both “symbolically” to represent things, and explicitly to “compute” things. All sorts of (often ornate) formalism was developed in mathematical logic, with combinators arriving in 1920, and lambda calculus in 1935. By the late 1950s there was LISP, and by the 1970s there was a definite tradition of “functional programming” involving the processing of things by successive application of different functions.

The question of what really depended on what became more significant whenever there was the possibility of doing computations in parallel. This was already being discussed in the 1960s, but became more popular in the early 1980s, and in a sense finally “went mainstream” with GPUs in the 2010s. And indeed our discussion of causal graphs and spacelike separation isn’t far away from the kind of thing that’s often discussed in the context of designing parallel algorithms and hardware. But one difference is that in those cases one’s usually imagining having a “static” flow of data and control, whereas here we’re routinely considering causal graphs, etc. that are being created “on the fly” by the actual progress of a computation.

In many situations—with both algorithms and hardware—one has precise control over when different “events” will occur. But in distributed systems it’s also common for events to be asynchronous. And in such cases, it’s possible to have “conflicts”, “race conditions”, etc. that correspond to branchlike separation. There have been various attempts—many originating in the 1970s—to develop formal “process calculi” to describe such systems. And in some ways what we’re doing here can be seen as a physics-inspired way to clarify and extend these kinds of approaches.

The concept of multiway systems also has a long history—notably appearing in the early 1900s in connection with game graphs, formal group theory and various problems in combinatorics. Later, multiway systems would implicitly show up in considerations of automated theorem proving and nondeterministic computation. In practical microprocessors it’s been common for a decade or so to do “speculative execution” where multiple branches in code are preemptively followed, keeping only the one that’s relevant given actual input received.

And when it comes to branchlike separation, a notable practical example arises in version control and collaborative editing systems. If a piece of text has changes at two separated places (“spacelike separation”), then these changes (“diffs”) can be applied in any order. But if these changes involve the same content (e.g. same characters) then there can be a conflict (“merge conflict”) if one tries to apply the changes—in effect reflecting the fact that these changes were made by branchlike-separated “change events” (and to trace them requires creating different “forks” or what we might call different histories).

It’s perhaps worth mentioning that as soon as one has the concept of an “expression” one is led to the concept of “evaluation”—and as we’ve seen many times here, that’s even true for arithmetic expressions, like 1 + (2 + 3). We’ve been particularly concerned with questions about “what depends on what” in the process of evaluation. But in practice there’s often also the question of when evaluation happens. The Wolfram Language, for example, distinguishes between “immediate evaluation” done when a definition is made, and “delayed evaluation” done when it’s used. There’s also lazy evaluation where what’s immediately generated is a symbolic representation of the computation to be done—with steps or pieces being explicitly computed only later, when they are requested.

But what really is “evaluation”? If our “input expression” is 1 + 1, we typically think of this as “defining a computation that can be done”. Then the idea of the “process of evaluation” is that it does that computation, deriving a final “value”, here 2. And one view of the Wolfram Language is that its whole goal is to set up a collection of transformations that do as many computations that we know how to do as possible. Some of those transformations effectively incorporate “factual knowledge” (like knowledge of mathematics, or chemistry, or geography). But some are more abstract, like transformations defining how to do transformations, say on patterns.

These abstract transformations are in a sense the easiest to trace—and often above that’s what we’ve concentrated on. But usually we’ve allowed ourselves to do at least some transformations—like adding numbers—that are built into the “insides” of the Wolfram Language. It’s perhaps worth mentioning that in conveniently representing such a broad range of computational processes the Wolfram Language ends up having some quite elaborate evaluation mechanisms. A common example is the idea of functions that “hold their arguments”, evaluating them only as “specifically requested” by the innards of the function. Another—that in effect creates a “side chain” to causal graphs—are conditions (e.g. associated with /;) that need to be evaluated to determine whether patterns are supposed to match.

Evaluation is in a sense the central operation in the Wolfram Language. And what we’ve seen here is that it has a deep correspondence with what we can view as the “central operation” of physics: the passage of time. Thinking in terms of physics helps organize our thinking about the process of evaluation—and it also suggests some important generalizations, like multiway evaluation. And one of the challenges for the future is to see how to take such generalizations and “package” them as part of our computational language in a form that we humans can readily understand and make use of.

Some Personal History: Recursion Control in SMP

It was in late 1979 that I first started to design my SMP (“Symbolic Manipulation Program”) system. I’d studied both practical computer systems and ideas from mathematical logic. And one of my conclusions was that any definition you made should always get used, whenever it could. If you set , then you set , you should get (not ) if you asked for . It’s what most people would expect should happen. But like almost all fundamental design decisions, in addition to its many benefits, it had some unexpected consequences. For example, it meant that if you set without having given a value for , you’d in principle get an infinite loop.

Back in 1980 there were computer scientists who asserted that this meant the “infinite evaluation” I’d built into the core of SMP “could never work”. Four decades of experience tells us rather definitively that in practice they were wrong about this (essentially because people just don’t end up “falling into the pothole” when they’re doing actual computations they want to do). But questions like those about made me particularly aware of issues around recursive evaluation. And it bothered me that a recursive factorial definition like f[n_]:=n f[n–1] (the rather less elegant SMP notation was f[$n]::$n f[$1-1]) might just run infinitely if it didn’t have a base case (f[1] = 1), rather than terminating with the value 0, which it “obviously should have”, given that at some point one’s computing 0×….

So in SMP I invented a rather elaborate scheme for recursion control that “solved” this problem. And here’s what happens in SMP (now running on a reconstructed virtual machine):

And, yes, if one includes the usual base case for factorial, one gets the usual answer:

So what is going on here? Section 3.1 of the SMP documentation in principle tells the story. In SMP I used the term “simplification” for what I’d now call “evaluation”, both because I imagined that most transformations one wanted would make things “simpler” (as in ), and because there was a nice pun between the name SMP and the function Smp that carried out the core operation of the system (yes, SMP rather foolishly used short names for built-in functions). Also, it’s useful to know that in SMP I called an ordinary expression like f[x, y, …] a “projection”: its “head” f was called its “projector”, and its arguments x, y, … were called “filters”.

As the Version 1.0 documentation from July 1981 tells it, “simplification” proceeds like this:

By the next year, it was a bit more sophisticated, though the default behavior didn’t change:

With the definitions above, the value of f itself was (compare Association in Wolfram Language):

But the key to evaluation without the base case actually came in the “properties” of multiplication:

In SMP True was (foolishly) 1. It’s notable here that Flat corresponds to the attribute Flat in Wolfram Language, Comm to Orderless and Ldist to Listable. (Sys indicated that this was a built-in system function, while Tier dealt with weird consequences of the attempted unification of arrays and functions into an association-like construct.) But the critical property here was Smp. By default its value was Inf (for Infinity). But for Mult (Times) it was 1.

And what this did was to tell the SMP evaluator that inside any multiplication, it should allow a function (like f) to be called recursively at most once before the actual multiplication was done. Telling SMP to trace the evaluation of f[5] we then see:

So what’s going on here? The first time f appears inside a multiplication its definition is used. But when f appears recursively a second time, it’s effectively frozen—and the multiplication is done using its frozen form, with the result that as soon as a 0 appears, one just ends up with 0.

Reset the Smp property of Mult to infinity, and the evaluation runs away, eventually producing a rather indecorous crash:

In effect, the Smp property defines how many recursive evaluations of arguments should be done before a function itself is evaluated. Setting the Smp property to 0 has essentially the same effect as the HoldAll attribute in Wolfram Language: it prevents arguments from being evaluated until a function as a whole is evaluated. Setting Smp to value k basically tells SMP to do only k levels of “depth-first” evaluation before collecting everything together to do a “breadth-first evaluation”.

Let’s look at this for a recursive definition of Fibonacci numbers:

With the Smp property of Plus set to infinity, the sequence of evaluations of f follows a pure “depth-first” pattern

where we can plot the sequence of f[n] evaluated as:

But with the default setting of 1 for the Smp property of Plus the sequence is different

and now the sequence of f[n] evaluated is:

In the pure depth-first case all the exponentially many leaves of the Fibonacci tree are explicitly evaluated. But now the evaluation of f[n] is being frozen after each step and terms are being collected and combined. Starting for example from f[10] we get f[9] + f[8]. And evaluating another step we get f[8] + f[7] + f[7] + f[6]. But now the f[7]’s can be combined into f[8] + 2f[7] + f[6] so that they don’t both have to separately be evaluated. And in the end only quadratically many separate evaluations are needed to get the final result.

I don’t now remember quite why I put it in, but SMP also had another piece of recursion control: the Rec property of a symbol—which basically meant “it’s OK for this symbol to appear recursively; don’t count it when you’re trying to work out whether to freeze an evaluation”.

And it’s worth mentioning that SMP also had a way to handle the original issue:

It wasn’t a terribly general mechanism, but at least it worked in this case:

I always thought that SMP’s “wait and combine terms before recursing” behavior was quite clever, but beyond the factorial and Fibonacci examples here I’m not sure I ever found clear uses for it. Still, with our current physics-inspired way of looking at things, we can see that this behavior basically corresponded to picking a “more spacetime-like” foliation of the evaluation graph.

And it’s a piece of personal irony that right around the time I was trying to figure out recursive evaluation in SMP, I was also working on gauge theories in physics—which in the end involve very much the same kinds of issues. But it took another four decades—and the development of our Physics Project—before I saw the fundamental connection between these things.

After SMP: Further Personal History

The idea of parallel computation was one that I was already thinking about at the very beginning of the 1980s—partly at a theoretical level for things like neural nets and cellular automata, and partly at a practical level for SMP (and indeed by 1982 I had described a Ser property in SMP that was supposed to ensure that the arguments of a particular function would always get evaluated in a definite order “in series”). Then in 1984 I was involved in trying to design a general language for parallel computation on the Connection Machine “massively parallel” computer. The “obvious” approach was just to assume that programs would be set up to operate in steps, even if at each step many different operations might happen in parallel. But I somehow thought that there must be a better approach, somehow based on graphs, and graph rewriting. But back then I didn’t, for example, think of formulating things in terms of causal graphs. And while I knew about phenomena like race conditions, I hadn’t yet internalized the idea of constructing multiway graphs to “represent all possibilities”.

When I started designing Mathematica—and what’s now the Wolfram Language—in 1986, I used the same core idea of transformation rules for symbolic expressions that was the basis for SMP. But I was able to greatly streamline the way expressions and their evaluation worked. And not knowing compelling use cases, I decided not to set up the kind of elaborate recursion control that was in SMP, and instead just to concentrate on basically two cases: functions with ordinary (essentially leftmost-innermost) evaluation and functions with held-argument (essentially outermost) evaluation. And I have to say that in three decades of usages and practical applications I haven’t really missed having more elaborate recursion controls.

In working on A New Kind of Science in the 1990s, issues of evaluation order first came up in connection with “symbolic systems” (essentially, generalized combinators). They then came up more poignantly when I explored the possible computational “infrastructure” for spacetime—and indeed that was where I first started explicitly discussing and constructing causal graphs.

But it was not until 2019 and early 2020, with the development of our Physics Project, that clear concepts of spacelike and branchlike separation for events emerged. The correspondence with expression evaluation got clearer in December 2020 when—in connection with the centenary of their invention—I did an extensive investigation of combinators (leading to my book Combinators). And as I started to explore the general concept of multicomputation, and its many potential applications, I soon saw the need for systematic ways to think about multicomputational evaluation in the context of symbolic language and symbolic expressions.

In both SMP and Wolfram Language the main idea is to “get results”. But particularly for debugging it’s always been of interest to see some kind of trace of how the results are obtained. In SMP—as we saw above—there was a Trace property that would cause any evaluation associated with a particular symbol to be printed. But what about an actual computable representation of the “trace”? In 1990 we introduced the function Trace in the Wolfram Language—which produces what amounts to a symbolic representation of an evaluation process.

I had high hopes for Trace—and for its ability to turn things like control flows into structures amenable to direct manipulation. But somehow what Trace produces is almost always too difficult to understand in real cases. And for many years I kept the problem of “making a better Trace” on my to-do list, though without much progress.

The problem of “exposing a process of computation” is quite like the problem of presenting a proof. And in 2000 I had occasion to use automated theorem proving to produce a long proof of my minimal axiom system for Boolean algebra. We wanted to introduce such methods into Mathematica (or what’s now the Wolfram Language). But we were stuck on the question of how to represent proofs—and in 2007 we ended up integrating just the “answer” part of the methods into the function FullSimplify.

By the 2010s we’d had the experience of producing step-by-step explanations in Wolfram|Alpha, as well as exploring proofs in the context of representing pure-mathematical knowledge. And finally in 2018 we introduced FindEquationalProof, which provided a symbolic representation of proofs—at least ones based on successive pattern matching and substitution—as well as a graphical representation of the relationships between lemmas.

After the arrival of our Physics Project—as well as my exploration of combinators—I returned to questions about the foundations of mathematics and developed a whole “physicalization of metamathematics” based on tracing what amount to multiway networks of proofs. But the steps in these proofs were still in a sense purely structural, involving only pattern matching and substitution.

I explored other applications of “multicomputation”, generating multiway systems based on numbers, multiway systems representing games, and so on. And I kept on wondering—and sometimes doing livestreamed discussions about—how best to create a language design around multicomputation. And as a first step towards that, we developed the TraceGraph function in the Wolfram Function Repository, which finally provided a somewhat readable graphical rendering of the output of Trace—and began to show the causal dependencies in at least single-way computation. But what about the multiway case? For the Physics Project we’d already developed MultiwaySystem and related functions in the Wolfram Function Repository. So now the question was: how could one streamline this and have it provide essentially a multiway generalization of TraceGraph? We began to think about—and implement—concepts like Multi, and imagine ways in which general multicomputation could encompass things like logic programming and probabilistic programming, as well as nondeterministic and quantum computation.

But meanwhile, the “ question” that had launched my whole adventure in recursion control in SMP was still showing up—43 years later—in the Wolfram Language. It had been there since Version 1.0, though it never seemed to matter much, and we’d always handled it just by having a global “recursion limit”—and then “holding” all further subevaluations:

But over the years there’d been increasing evidence that this wasn’t quite adequate, and that for example further processing of the held form (even, for example, formatting it) could in extreme cases end up triggering even infinite cascades of evaluations. So finally—in Version 13.2 at the end of last year—we introduced the beginnings of a new mechanism to cut off “runaway” computations, based on a construct called TerminatedEvaluation:

And from the beginning we wanted to see how to encode within TerminatedEvaluation information about just what evaluation had been terminated. But to do this once again seemed to require having a way to represent the “ongoing process of evaluation”—leading us back to Trace, and making us think about evaluation graphs, causal graphs, etc.

At the beginning x = x + 1 might just have seemed like an irrelevant corner case—and for practical purposes it basically is. But already four decades ago it led me to start thinking not just about the results of computations, but also how their internal processes can be systematically organized. For years, I didn’t really connect this to my work on explicit computational processes like those in systems such as cellular automata. Hints of such connections did start to emerge as I began to try to build computational models of fundamental physics. But looking back I realize that in x = x + 1 there was already in a sense a shadow of what was to come in our Physics Project and in the whole construction of the ruliad.

Because x = x + 1 is something which—like physics and like the ruliad—necessarily generates an ongoing process of computation. One might have thought that the fact that it doesn’t just “give an answer” was in a sense a sign of uselessness. But what we’ve now realized is that our whole existence and experience is based precisely on “living inside a computational process” (which, fortunately for us, hasn’t just “ended with an answer”). Expression evaluation is in its origins intended as a “human-accessible” form of computation. But what we’re now seeing is that its essence also inevitably encompasses computations that are at the core of fundamental physics. And by seeing the correspondence between what might at first appear to be utterly unrelated intellectual directions, we can expect to inform both of them. Which is what I have started to try to do here.

Notes & Thanks

What I’ve described here builds quite directly on some of my recent work, particularly as covered in my books Combinators: A Centennial View and Metamathematics: Physicalization & Foundations. But as I mentioned above, I started thinking about related issues at the beginning of the 1980s in connection with the design of SMP, and I’d like to thank members of the SMP development team for discussions at that time, particularly Chris Cole, Jeff Greif and Tim Shaw. Thanks also to Bruce Smith for his 1990 work on Trace in Wolfram Language, and for encouraging me to think about symbolic representations of computational processes. In much more recent times, I’d particularly like to thank Jonathan Gorard for his extensive conceptual and practical work on multiway systems and their formalism, both in our Physics Project and beyond. Some of the directions described here have (at least indirectly) been discussed in a number of recent Wolfram Language design review livestreams, with particular participation by Ian Ford, Nik Murzin, and Christopher Wolfram, as well as Dan Lichtblau and Itai Seggev. Thanks also to Wolfram Institute fellows Richard Assar and especially Nik Murzin for their help with this piece.

Remembering Doug Lenat (1950–2023) and His Quest to Capture the World with Logic

Stephen Wolfram — Tue, 05 Sep 2023 22:23:11 +0000

Logic, Math and AI

In many ways the great quest of Doug Lenat’s life was an attempt to follow on directly from the work of Aristotle and Leibniz. For what Doug was fundamentally trying to do over the forty years he spent developing his CYC system was to use the framework of logic—in more or less the same form that Aristotle and Leibniz had it—to capture what happens in the world. It was a noble effort and an impressive example of long-term intellectual tenacity. And while I never managed to actually use CYC myself, I consider it a magnificent experiment—that if nothing else ultimately served to demonstrate the importance of building frameworks beyond logic alone in usefully representing and reasoning about the world.

Doug Lenat started working on artificial intelligence at a time when nobody really knew what might be possible—or even easy—to do. Was AI (whatever that might mean) just a clever algorithm—or a new type of computer—away? Or was it all just an “engineering problem” that simply required pulling together a bigger and better “expert system”? There was all sorts of mystery—and quite a lot of hocus pocus—around AI. Did the demo one was seeing actually prove something, or was it really just a trivial (if perhaps unwitting) cheat?

I first met Doug Lenat at the beginning of the 1980s. I had just developed my SMP (“Symbolic Manipulation Program”) system, that was the forerunner of Mathematica and the modern Wolfram Language. And I had been quite exposed to commercial efforts to “do AI” (and indeed our VCs had even pushed my first company to take on the dubious name “Inference Corporation”, complete with a “=>” logo). And I have to say that when I first met Doug I was quite dismissive. He told me he had a program (that he called “AM” for “Automated Mathematician”, and that had been the subject of his Stanford CS PhD thesis) that could discover—and in fact had discovered—nontrivial mathematical theorems.

“What theorems?” I asked. “What did you put in? What did you get out?” I suppose to many people the concept of searching for theorems would have seemed like something remarkable, and immediately exciting. But not only had I myself just built a system for systematically representing mathematics in computational form, I had also been enumerating large collections of simple programs like cellular automata. I poked at what Doug said he’d done, and came away unconvinced. Right around the same time I happened to be visiting a leading university AI group, who told me they had a system for translating stories from Spanish into English. “Can I try it?” I asked, suspending for a moment my feeling that this sounded like science fiction. “I don’t really know Spanish”, I said, “Can I start with just a few words?” “No”, they said, “the system works only with stories.” “How long does a story have to be?” I asked. “Actually it has to be a particular kind of story”, they said. “What kind?” I asked. There were a few more iterations, but eventually it came out: the “system” translated one particular story from Spanish into English! I’m not sure if my response included an expletive, but I wondered what kind of science, technology, or anything else this was supposed to be. And when Doug told me about his “Automated Mathematician”, this was the kind of thing I was afraid I was going to find.

Years later, I might say, I think there’s something AM could have been trying to do that’s valid, and interesting, if not obviously possible. Given a particular axiom system it’s easy to mechanically generate infinite collections of “true theorems”—that in effect fill metamathematical space. But now the question is: which of these theorems will human mathematicians find “interesting”? It’s not clear how much of the answer has to do with the “social history of mathematics”, and how much is more about “abstract principles”. I’ve been studying this quite a bit in recent years (not least because I think it could be useful in practice)—and have some rather deep conclusions about its relation to the nature of mathematics. But I now do wonder to what extent Doug’s work from all those years ago might (or might not) contain heuristics that would be worth trying to pursue even now.

CYC

I ran into Doug quite a few times in the early to mid-1980s, both around a company called Thinking Machines (to which I was a consultant) and at various events that somehow touched on AI. There was a fairly small and somewhat fragmented AI community in those days, with the academic part in the US concentrated around MIT, Stanford and CMU. I had the impression that Doug was never quite at the center of that community, but was somehow nevertheless a “notable member”, who—particularly with his work being connected to math—was seen as “doing upscale things” around AI.

In 1984 I wrote an article for a special issue of Scientific American on “computer software” (yes, software was trendy then). My article was entitled “Computer Software in Science and Mathematics”, and the very next article was by Doug, entitled “Computer Software for Intelligent Systems”. The summary at the top of my article read: “Computation offers a new means of describing and investigating scientific and mathematical systems. Simulation by computer may be the only way to predict how certain complicated systems evolve.” And the summary for Doug’s article read: “The key to intelligent problem solving lies in reducing the random search for solutions. To do so intelligent computer programs must tap the same underlying ‘sources of power’ as human beings”. And I suppose in many ways both of us spent most of our next four decades essentially trying to fill out the promise of these summaries.

A key point in Doug’s article—with which I wholeheartedly agree—is that to create something one can usefully identify as “AI”, it’s essential to somehow have lots of knowledge of the world built in. But how should that be done? How should the knowledge be encoded? And how should it be used?

Doug’s article in Scientific American illustrated his basic idea:

Encode knowledge about the world in the form of statements of logic. Then find ways to piece together these statements to derive conclusions. It was, in a sense, a very classic approach to formalizing the world—and one that would at least in concept be familiar to Aristotle and Leibniz. Of course it was now using computers—both as a way to store the logical statements, and as a way to find inferences from them.

At first, I think Doug felt the main problem was how to “search for correct inferences”. Given a whole collection of logical statements, he was asking how these could be knitted together to answer some particular question. In essence it was just like mathematical theorem proving: how could one knit together axioms to make a proof of a particular theorem? And especially with the computers and algorithms of the time, this seemed like a daunting problem in almost any realistic case.

But then how did humans ever manage to do it? What Doug imagined was that the critical element was heuristics: strategies for guessing how one might “jump ahead” and not have to do the kind of painstaking searches that systematic methods seemed to imply would be needed. Doug developed a system he called EURISKO that implemented a range of heuristics—that Doug expected could be used not only for math, but basically for anything, or at least anything where human-like thinking was effective. And, yes, EURISKO included not only heuristics, but also at least some kinds of heuristics for making new heuristics, etc.

But OK, so Doug imagined that EURISKO could be used to “reason about” anything. So if it had the kind of knowledge humans do, then—Doug believed—it should be able to reason just like humans. In other words, it should be able to deliver some kind of “genuine artificial intelligence” capable of matching human thinking.

There were all sorts of specific domains of knowledge to consider. But Doug particularly wanted to push in what seemed like the most broadly impactful direction—and tackle the problem of commonsense knowledge and commonsense reasoning. And so it was that Doug began what would become a lifelong project to encode as much knowledge as possible in the form of statements of logic.

In 1984 Doug’s project—now named CYC—became a flagship part of MCC (Microelectronics and Computer Technology Corporation) in Austin, TX—an industry-government consortium that had just been created to counter the perceived threat from the Japanese “Fifth Generation Computer Project”, that had shocked the US research establishment by putting immense resources into “solving AI” (and was actually emphasizing many of the same underlying rule-based techniques as Doug). And at MCC Doug had the resources to hire scores of people to embark on what was expected to be a few thousand person-years of effort.

I didn’t hear much about CYC for quite a while, though shortly after Mathematica was released in 1988 Marvin Minsky mused to me about how it seemed like we were doing for math-like knowledge what CYC was hoping to do for commonsense knowledge. I think Marvin wasn’t convinced that Doug had the technical parts of CYC right (and, yes, they weren’t using Marvin’s theories as much as they might). But in those years Marvin seemed to feel that CYC was one of the few AI projects going on that actually made any sense. And indeed in my archives I find a rather charming email from Marvin in 1992, attaching a draft of a science fiction novel (entitled The Turing Option) that he was writing with Harry Harrison, which contained mention of CYC:

June 19, 2024

When Brian and Ben reached the lab, the computer was running
but the tree-robot was folded and motionless. “Robin,
activate.”

…

“Robin will have to use different concepts of progress for
different kinds of problems. And different kinds of subgoals
for reducing those different kinds of differences.”

“Won’t that require enormous amounts of knowledge?”

“It will indeed—and that’s one reason human education takes
so long. But Robin should already contain a massive amount of
just that kind of information—as part of his CYC-9 knowledge-
base.”

…

“There now exists a procedural model for the behavior of a
human individual, based on the prototype human described in
section 6.001 of the CYC-9 knowledge base. Now customizing
parameters on the basis of the example person Brian Delaney
described in the employment, health, and security records of
Megalobe Corporation.”

A brief silence ensued. Then the voice continued.

“The Delaney model is judged as incomplete as compared to those
of other persons such as President Abraham Lincoln, who has
3596.6 megabytes of descriptive text, or Commander James
Bond, who has 16.9 megabytes.”

Later, one of the novel’s characters observes: “Even if we started with nothing but the
old Lenat–Haase representation-languages, we’d still be far ahead of what any animal ever evolved.” (Ken Haase was a student of Marvin’s who critiqued and extended Doug’s work on heuristics.)

I was exposed to CYC again in 1996 in connection with a book called HAL’s Legacy—to which both Doug and I contributed—published in honor of the fictional birthday of the AI in the movie 2001. But mostly AI as a whole was in the doldrums, and almost nobody seemed to be taking it seriously. Sometimes I would hear murmurs about CYC, mostly from government and military contacts. Among academics, Doug would occasionally come up, but rather cruelly he was most notable for his name being used for a unit of “bogosity”—the lenat—of which it was said that “Like the farad it is considered far too large a unit for practical use, so bogosity is usually expressed in microlenats”.

Doug Meets Wolfram|Alpha

Many years passed. I certainly hadn’t forgotten Doug, or CYC. And a few times people suggested connecting CYC in some way to our technology. But nothing ever happened. Then in the spring of 2009 we were nearing the first release of Wolfram|Alpha, and it seemed like I finally had something that I might meaningfully be able to talk to Doug about.

I sent a rather tentative email:

Subject: something you might find interesting…

Date: Thu, 05 Mar 2009 11:15:04 -0500

From: Stephen Wolfram

To: Doug Lenat

We’re in the final stages of a rather large project that I think relates to
some of your interests.

I just made a small blog post about it:

http://blog.wolfram.com/2009/03/05/wolframalpha-is-coming/

I’d be pleased to give you a webconference demo if you’re interested.

I hope you’ve been well all these years.

— Stephen

Doug quickly responded:

Subject: Re: something you might find interesting…

Date: Thu, 5 Mar 2009 13:23:31 -0600

From: Doug Lenat

To: Stephen Wolfram

Hi, Stephen.

You have become a master of understatement! This certainly
does relate to the 1000 person-years we’ve spent building Cyc’s ontology,
knowledge base, and inference engines, over the last 25 years. I’d very
much like to see a webconference demo, so we identify the opportunities for
synergy.

Regards
Doug

It was definitely a “you’re on my turf” kind of response. And I wasn’t sure what to expect from Doug. But a few days later we had a long call with Doug and some of the senior members of what was now the Cycorp team. And Doug did something that deeply impressed me. Rather than for example nitpicking that Wolfram|Alpha was “not AI” he basically just said “We’ve been trying to do something like this for years, and now you’ve succeeded”. It was a great—and even inspirational—show of intellectual integrity. And whatever I might think of CYC and Doug’s other work (and I’d never formed a terribly clear opinion), this for me put Doug firmly in the category of people to respect.

Doug wrote a blog post entitled “I was positively impressed with Wolfram Alpha”, and immediately started inviting us to various AI and industry-pooh-bah events to which he was connected.

Doug seemed genuinely pleased that we had made such progress in something so close to his longtime objectives. I talked to him about the comparison between our approaches. He was just working with “pure human-like reasoning”, I said, like one would have had to do in the Middle Ages. But, I said, “In a sense we cheated”. Because we used all the things that got invented in modern times in science and math and so on. If he wanted to work out how some mechanical system would behave, he would have to reason through it: “If you push this down, that pulls up, then this rolls”, etc. But with what we’re doing, we just have to turn everything into math (or something like it), then systematically solve it using equations and so on.

And there was something else too: we weren’t trying to use just logic to represent the world, we were using the full power and richness of computation. In talking about the Solar System, we didn’t just say that “Mars is a planet contained in the Solar System”; we had an algorithm for computing its detailed motion, and so on.

Doug and CYC had also emphasized the scraps of knowledge that seem to appear in our “common sense”. But we were interested in systematic, computable knowledge. We didn’t just want a few scattered “common facts” about animals. We wanted systematic tables of properties of millions of species. And we had very general computational ways to represent things: not just words or tags for things, but systematic ways to capture computational structures, whether they were entities, graphs, formulas, images, time series, or geometrical forms, or whatever.

I think Doug viewed CYC as some kind of formalized idealization of how he imagined human minds work: providing a framework into which a large collection of (fairly undifferentiated) knowledge about the world could be “poured”. At some level it was a very “pure AI” concept: set up a generic brain-like thing, then “it’ll just do the rest”. But Doug still felt that the thing had to operate according to logic, and that what was fed into it also had to consist of knowledge packaged up in the form of logic.

But while Doug’s starting points were AI and logic, mine were something different—in effect computation writ large. I always viewed logic as something not terribly special: a particular formal system that described certain kinds of things, but didn’t have any great generality. To me the truly general concept was computation. And that’s what I’ve always used as my foundation. And it’s what’s now led to the modern Wolfram Language, with its character as a full-scale computational language.

There is a principled foundation. But it’s not logic. It’s something much more general, and structural: arbitrary symbolic expressions and transformations of them. And I’ve spent much of the past forty years building up coherent computational representations of the whole range of concepts and constructs that we encounter in the world and in our thinking about it. The goal is to have a language—in effect, a notation—that can represent things in a precise, computational way. But then to actually have the built-in capability to compute with that representation. Not to figure out how to string together logical statements, but rather to do whatever computation might need to be done to get an answer.

But beyond their technical visions and architectures, there is a certain parallelism between CYC and the Wolfram Language. Both have been huge projects. Both have been in development for more than forty years. And both have been led by a single person all that time. Yes, the Wolfram Language is certainly the larger of the two. But in the spectrum of technical projects, CYC is still a highly exceptional example of longevity and persistence of vision—and a truly impressive achievement.

Later Years

After Wolfram|Alpha came on the scene I started interacting more with Doug, not least because I often came to the SXSW conference in Austin, and would usually make a point of reaching out to Doug when I did. Could CYC use Wolfram|Alpha and the Wolfram Language? Could we somehow usefully connect our technology to CYC?

When I talked to Doug he tended to downplay the commonsense aspects of CYC, instead talking about defense, intelligence analysis, healthcare, etc. applications. He’d enthusiastically tell me about particular kinds of knowledge that had been put into CYC. But time and time again I’d have to tell him that actually we already had systematic data and algorithms in those areas. Often I felt a bit bad about it. It was as if he’d been painstakingly planting crops one by one, and we’d come through with a giant industrial machine.

In 2010 we made a big “Timeline of Systematic Data and the Development of Computable Knowledge” poster—and CYC was on it as one of the six entries that began in the 1980s (alongside, for example, the web). Doug and I continued to talk about somehow working together, but nothing ever happened. One problem was the asymmetry: Doug could play with Wolfram|Alpha and Wolfram Language any time. But I’d never once actually been able to try CYC. Several times Doug had promised API keys, but none had ever materialized.

Eventually Doug said to me: “Look, I’m worried you’re going to think it’s bogus”. And particularly knowing Doug’s history with alleged “bogosity” I tried to assure him my goal wasn’t to judge. Or, as I put it in a 2014 email: “Please don’t worry that we’ll think it’s ‘bogus’. I’m interested in finding the good stuff in what you’ve done, not criticizing its flaws.”

But when I was at SXSW the next year Doug had something else he wanted to show me. It was a math education game. And Doug seemed incredibly excited about its videogame setup, complete with 3D spacecraft scenery. My son Christopher was there and politely asked if this was the default Unity scenery. I kept on saying, “Doug, I’ve seen videogames before; show me the AI!” But Doug didn’t seem interested in that anymore, eventually saying that the game wasn’t using CYC—though did still (somewhat) use “rule-based AI”.

I’d already been talking to Doug, though, about what I saw as being an obvious, powerful application of CYC in the context of Wolfram|Alpha: solving math word problems. Given a problem, say, in the form of equations, we could solve pretty much anything thrown at us. But with a word problem like “If Mary has 7 marbles and 3 fall down a drain, how many does she now have?” we didn’t stand a chance. Because to solve this requires commonsense knowledge of the world, which isn’t what Wolfram|Alpha is about. But it is what CYC is supposed to be about. Sadly, though, despite many reminders, we never got to try this out. (And, yes, we built various simple linguistic templates for this kind of thing into Wolfram|Alpha, and now there are LLMs.)

Independent of anything else, it was impressive that Doug had kept CYC and Cycorp running all those years. But when I saw him in 2015 he was enthusiastically telling me about what I told him seemed to me to be a too-good-to-be-true deal he was making around CYC. A little later there was a strange attempt to sell us the technology of CYC, and I don’t think our teams interacted again after that.

I personally continued to interact with Doug, though. I sent him things I wrote about the formalization of math. He responded pointing me to things he’d done on AM. On the tenth anniversary of Wolfram|Alpha Doug sent me a nice note, offering that “If you want to team up on, e.g., knocking the Winograd sentence pairs out of the park, let me know.” I have to say I wondered what a “Winograd sentence pair” was. It felt like some kind of challenge from an age of AI long past (apparently it has to do with identifying pronoun reference, which of course has become even more difficult in modern English usage).

And as I write this today, I realize a mistake I made back in 2016. I had for years been thinking about what I’ve come to call “symbolic discourse language”—an extension of computational language that can represent “everyday discourse”. And—stimulated by blockchain and the idea of computational contracts—I finally wrote something about this in 2016, and I now realize that I overlooked sending Doug a link to it. Which is a shame, because maybe it would have finally been the thing that got us to connect our systems.

And Now There Are LLMs

Doug was a person who believed in formalism, particularly logic. And I have the impression that he always considered approaches like neural nets not really to have a chance of “solving the problem of AI”. But now we have LLMs. So how do they fit in with things like the ideas of CYC?

One of the surprises of LLMs is that they often seem, in effect, to use logic, even though there’s nothing in their setup that explicitly involves logic. But (as I’ve described elsewhere) I’m pretty sure what’s happened is that LLMs have “discovered” logic much as Aristotle did—by looking at lots of examples of statements people make and identifying patterns in them. And in a similar way LLMs have “discovered” lots of commonsense knowledge, and reasoning. They’re just following patterns they’ve seen, but—probably in effect organized into what I’ve called a “semantic grammar” that determines “laws of semantic motion”—that’s enough to often achieve some fairly impressive commonsense-like results.

I suspect that a great many of the statements that were fed into CYC could now be generated fairly successfully with LLMs. And perhaps one day there’ll be good enough “LLM science” to be able to identify mechanisms behind what LLMs can do in the commonsense arena—and maybe they’ll even look a bit like what’s in CYC, and how it uses logic. But in a sense the very success of LLMs in the commonsense arena strongly suggests that you don’t fundamentally need deep “structured logic” for that. Though, yes, the LLM may be immensely less efficient—and perhaps less reliable—than a direct symbolic approach.

It’s a very different story, by the way, with computational language and computation. LLMs are through and through based on language and patterns to be found through it. But computation—as it can be accessed through structured computational language—is something very different. It’s about processes that are in a sense thoroughly non-human, and that involve much deeper following of general formal rules, as well as much more structured kinds of data, etc. An LLM might be able to do basic logic, as humans have. But it doesn’t stand a chance on things where humans have had to systematically use formal tools that do serious computation. Insofar as LLMs represent “statistical AI”, CYC represents a certain level of “symbolic AI”. But computational language and computation go much further—to a place where LLMs can’t and shouldn’t follow, and should just call them as tools.

Doug always seemed to have a very optimistic view of the promise of AI. In 2013 he wrote to me:

Of course you are coming at this from the opposite end of the Chunnel than
we are, but you’re proceeding, frankly, much more rapidly toward us than we
are toward you. I probably appreciate the significance of what you’ve
accomplished more than almost anyone else: when your and our approaches do
meet up, the combination will be the existence of real AI on Earth. I
think that’s the main motivation in your life, as it is in mine: to live to
see real AI, with the obvious sweeping change in all aspects of life when
there is (i) cradle-to-grave 24×7 Aristotle mentoring and advising for
every human being and, in effect, (ii) a Land of Faerie intelligence
effectively present [e.g., that one can converse with] in every door, floor
tile,…every tangible object above a certain microscopic size.) And to
live to see and be users ourselves in an era of massively amplified human
intelligence …

The last mail I received from Doug was on January 10, 2023—telling me that he thought it was great that I was talking about connecting our tech to ChatGPT. He said, though, that he found it “increasingly worrisome that these models train on CONVINCINGNESS rather than CORRECTNESS”, then gave an example of ChatGPT getting a math word problem wrong.
His email ended:

Yes, let’s chat again at your convenience… it bothers both of us, I
believe, that our systems aren’t leveraging each other! That just bothers
me more and more as I get old (not just older).

Sadly we never did chat again. We now have a team actively working on symbolic discourse language, and just last week I mentioned CYC to them—and lamented that I’d never been able to try it. And then on Friday I heard that Doug had died. A remarkable pioneer of AI who steadfastly pursued his vision over the whole course of his career, and was taken far too soon.

Remembering the Improbable Life of Ed Fredkin (1934–2023) and His World of Ideas and Stories

Mark Long — Tue, 22 Aug 2023 20:03:54 +0000

Programmer of the Universe

“OK, so let me tell you…” And so it would begin. A long and colorful story. An elaborate description of a wild idea. In the forty years I knew Ed Fredkin I heard countless wild ideas and colorful stories from him. He always radiated a certain adventurous joy—together with supreme, almost-childlike confidence. Ed was someone who wanted to independently figure things out for himself, and delighted in presenting his often somewhat-outlandish conclusions—whether about technology, science, business or the world—with dramatic showman-like panache.

In all the years I knew Ed, I’m not sure he ever really listened to anything I said (though he did use tools I built). He used to like to tell people I’d learned a lot from him. And indeed we had intellectual interests that should have overlapped. But in actuality our ways of thinking about them mostly didn’t connect much at all. But at a personal and social level it was still always a lot of fun being around Ed and being exposed to his unique intense opportunistic energy—with its repeating themes but ever-changing directions.

And there was one way in which Ed and I were very much aligned: both of our lives were deeply influenced by computers and computing. Ed had started with computers in 1956—as part of one of the very first cohorts of programmers. And perhaps on the basis of that experience, he would still, even at the end of his life, matter-of-factly refer to himself as “the world’s best programmer”. Indeed, so confident was he of his programming prowess that he became convinced that he should in effect be able to write a program for the universe—and make all of physics into a programming problem. It didn’t help that his knowledge of physics was at best spotty (and, for example, I don’t think he ever really learned calculus). But his almost lifelong desire to “program physics” did successfully lead him to the concept of reversible logic, and to what’s now called the “Fredkin gate”. But it also led him to the idea that the universe must be a giant cellular automaton—whose program he could invent.

I first met Ed in 1982—on an island in the Caribbean he had bought with money from taking public a tech company he’d founded. The year before, I had started studying cellular automata, but, unlike Ed, I wasn’t trying to “program” them—to be the universe or anything else. Instead, I was mostly doing what amounted to empirical science, running computer experiments to see what they did, and treating them as part of a computational universe of possible programs “out there to explore”. It wasn’t a methodology I think Ed ever really understood—or cared about. He was a programmer (and inventor), not an empirical scientist. And he was convinced—like a modern analog of an ancient Greek philosopher—that by pure thought he could come up with the whole “clockwork” of the universe.

Central to his picture was the idea that at the bottom of everything was a cellular automaton, with its grid of cells somehow laid out in space. I told Ed countless times that what was known from twentieth-century physics implied this really couldn’t be how things worked at a fundamental level. I tried to interest Ed in my way of using cellular automata. But Ed wasn’t interested. He was going for what he saw as the big prize: using them to “construct the universe”.

Every few years Ed would tell me he’d made progress—and rather dramatically say things like that he’d “found the electron”. I’d politely ask for details. Then start pointing out that it couldn’t work that way. But soon Ed would be telling a story or talking about some completely different idea—about technology, business or something else.

By the mid-1980s I’d discovered a lot about cellular automata. And I always felt a bit embarrassed by Ed’s attempt to use them in what seemed to me like a very naive way for fundamental physics—and I worried (as did happen a few times) that people would dismiss my efforts by identifying them with his.

My own career had begun in the 1970s with traditional fundamental physics. And while I didn’t think cellular automata as such could be directly applied to fundamental physics, I did think that the core computational phenomena I’d discovered through studying cellular automata might be very relevant. And then in the early 1990s I had an idea. In a cellular automaton, space has a fixed grid-like structure. But what if the structure of space is in fact dynamic, and everything in the universe emerges just from the dynamics of that structure? Finally I felt as if there might be a plausible computational foundation for fundamental physics.

I wrote about this in one chapter of my 2002 book A New Kind of Science. I don’t know if Ed ever read what I wrote, but in any case it didn’t seem to affect his idea that the universe was a cellular automaton—and to confuse things further, he told quite a few people that was what I was saying too. At first I found this frustrating—and upsetting—but eventually I realized it was just “Ed being Ed”, and there were still plenty of things to like about Ed.

Nearly twenty years passed. I would see Ed with some regularity. And sometimes I would mention physics. But Ed would just keep talking about his idea that the universe is a cellular automaton. And when we finally made the breakthrough that led in 2020 to our Physics Project it made me a little sad that I didn’t even try to explain it to Ed. The universe isn’t a cellular automaton. But it is computational. And I think that knowing this would have brought a certain intellectual closure to Ed’s long journey and aspirations around physics.

Ed might have considered physics his single most important quest. But Ed’s life as a whole was filled with a remarkably rich assortment of activities and interests. Computers. Inventions. Companies. Airplanes. MIT. His island. The Soviet Union. Not to mention people, like Marvin Minsky, John McCarthy and Richard Feynman (as well as Tom Watson, Richard Branson, and many more). And he would tell stories about all these people and things, and more. Sometimes (particularly later in his life) the stories would repeat. But with remarkable regularity Ed would surprise me with yet another—often at first hard-to-believe—story about a situation or topic that I had no idea he’d ever been involved in.

But what was the “whole Ed story”? I knew a lot of fragments, often quite colorful. But they didn’t seem to fit together into the narrative of a life. And now that Ed is sadly no longer with us, I decided I should really try to “understand Ed” and his story. A few times over the years I had made efforts to ask Ed for systematic historical accounts—and in 2014 I even recorded many hours of oral history with him. But there was clearly much more. And in writing this piece I found myself going through lots of documents and archives—and having quite a few conversations— and unearthing even yet more stories than I already knew. And in the end there’s a lot to say—and indeed this has turned into the most difficult and complicated biographical piece I’ve ever written. But I hope that everything I’ve assembled will help tell the often so-wild-you-can’t-make-this-stuff-up story of that most singular individual who I knew all those years.

The Beginning of the Story

Ed never said much to me about his early life. And in fact I think it was only in writing this piece that I even learned he’d grown up in Los Angeles (specifically, East Hollywood). His parents were both (Jewish) Russian immigrants (his father was born in St. Petersburg; his mother in Odessa; they met in LA). His father’s university engineering studies had been cut short by the Russian Revolution, and he now had a one-man wholesale electronic parts business. His mother had in her youth been trained as a concert pianist, and died when Ed was 11, leaving a somewhat fragmented family situation. Ed had a half-sister, 14 years older than him, a brother 6 years older, and a sister a year older. As he told it in later oral histories, he got interested in both machines and money very early, repairing appliances for a fee even as a tween, and soon learning about the idea of owning stock in companies.

But Ed Fredkin’s first piece of public visibility seems to have come in 1948, when he was 13 years old—and it reminds me so much of many of Ed’s later “self-imposed” adventures. There was at that time an exhibition of historic US documents traveling around the country on a train named the Freedom Train. And when the train came to Los Angeles, the young Ed Fredkin decided he had to be the first person to see it:

The Los Angeles Times published his account of his adventure—a younger but “quintessentially Ed” story:

Ed’s record in high school was at best spotty. But as he tells it, he figured out very early a system for improving the odds in multiple-choice tests, and for example in 9th grade got a top score on a newly instituted (multiple-choice) California-wide IQ test. At the end of high school, Ed applied to Caltech (which was only 13 miles away from where he lived), and largely on the basis of his test scores, was admitted. He ended up spending time working various jobs to support himself, didn’t do much homework, and by his sophomore year—before having to pick a major—dropped out. In 2015 Ed told me a nice story about his time at Caltech:

In 1952–53, I was a student in Linus Pauling’s class where he lectured Freshman Chemistry at Caltech. After class, one day, I asked Pauling “What is a superconductor at the highest known temperature?” Pauling immediately replied “Niobium Nitride, 18 Kelvin”. I was puzzled because I had never heard of Niobium, so I looked it up and, with some difficulty found a reference that defined it as a European name for the metal Columbium.

Later that same day, reading a Pasadena newspaper, I saw an article about Pauling: It announced that Pauling had just returned from Europe (London is what I recall) where Pauling, as Chairman of the International Committee on the naming of the elements, had decided that henceforth the metal Columbium would be renamed Niobium.

I recently looked into that matter and discovered that evidently that renaming was part of a USA–Europe Compromise… In Europe it had been Wolfram and Niobium, in the USA it had been Tungsten and Columbium.

Europe got its way re Niobium and the USA got its way re Tungsten… Perhaps it was a flip of a coin? Someone might know.

As a Wolfram, I thought you might be interested (and, of course, perhaps all this is old hat to you…).

(For what it’s worth, I actually didn’t know this “Wolfram story”, though the details weren’t quite as dramatic as Ed said: the “niobium” decision was actually made in 1949, without Pauling specifically involved, though Pauling did indeed travel to London just before the beginning of the 1952 school year.)

With his interest in machinery, Ed had always been keen on cars, and in his freshman year at Caltech, he also decided to learn to fly a plane. Ed’s older brother, Norman, had joined the Air Force five years earlier. And when he left Caltech—in 1954 at age 19—Ed joined the Air Force too. (If he hadn’t done that, he would have been drafted into the Army.) Ed’s brother Norman (who would spend his whole career in aviation) had been involved in the Korean War, particularly doing aerial reconnaissance—here pictured with his plane (and, no, there don’t seem to be any Air Force pictures of Ed himself):

By the time Ed joined the Air Force, the Korean War was over. Ed was assigned to an airbase in Arizona, and by the summer of 1955 he had qualified as a fighter pilot. Ed was never officially a “test pilot”, but he told me stories about figuring out how to take his plane higher than anyone else—and achieving weightlessness by flying his plane in a perfect free-fall trajectory by maintaining an eraser floating in midair in front of him.

By 1956 Ed had been grounded from flying as a result of asthma, and was now at an airbase in Florida as an “intercept controller”—essentially an air traffic controller responsible for guiding fighters to intercept bombers. It was a time when the Air Force was developing the SAGE (Semi-Automatic Ground Environment) air defense system—a huge project whose concept was to use computers to coordinate data from many radars so as to be able to intercept Soviet bombers that might attack the US (cf. Dr. Strangelove, etc.). The center of SAGE development was Lincoln Lab (then part of MIT) in Lexington, MA—with IBM providing computers, Bell (AT&T) providing telecommunications, RAND providing algorithms, etc. And in mid-1956 the Air Force sent a group—including Ed—to test the next phase of SAGE. But as Ed tells it, they were soon informed that actually there would be a one-year delay.

At the time, the SAGE project was busily trying to train people about computers, and some people from the Air Force stayed in the Boston area to participate in this. As Ed tells it, however, he was the only one who didn’t drop out of the training—and over the course of a year it taught him “much of what was then known about computer programming and computer hardware design”. There were at the time only a few hundred people in the world who could call themselves programmers. And Ed was now one of them. (Perhaps he was even “the world’s best”.)

Computers!

Having learned to program, Ed remained at Lincoln Lab, paid by the Air Force, doing what amounted to computational “odd jobs”. Often this had to do with connecting systems together, or coming up with “clever hacks” to overcome particular system limitations. Occasionally it was a little more algorithmic—like when Sputnik was launched in 1957, and Ed got pulled into a piece of “emergency programming” for orbit calculations.

Ed told many stories about “hacking” the bureaucracy at the Air Force (being given a “Secret” stamp so he could read his own documents; avoiding being sent for a year to the Canadian Arctic by finding a loophole associated with his wife being pregnant, etc.)—and in 1958 he left the Air Force (though he would remain a captain in the reserves for many years), but stayed on at Lincoln Lab. Officially he was there as an “administrative assistant”, because—without a degree—that was all they could offer him. But by then he was becoming known as a “computer person”—with lots of ideas. He wanted to start his own company. And (as he tells it) the very first potential customer he visited was an MIT-spinoff acoustics firm called Bolt Beranek & Newman (BBN). And the person he saw there was their “vice president of engineering psychology”—a certain J. C. R. “Lick” Licklider—who persuaded Ed to join BBN to “teach them about computers”.

It didn’t really come to light until he was at BBN, but while at Lincoln Lab Ed had made what would eventually become his first lasting contribution to computer science. He thought of it as a new way of storing textual information in a computer, and he called it “TRIE memory” (after “reTRIEval”). Nowadays we’d call it the trie (or prefix tree) data structure. Here it is for some common words in English made from the letters of “wolf”:

Licklider persuaded Ed to write a paper about tries—which appeared in 1960, and for a couple of decades was essentially Ed’s only academic-style publication:

The paper has a pretty clear description of tries, even with some nice diagrams:

Even in analyzing the performance of tries, there was only the faintest hint of math in the paper—though Ed realized (probably with input from Licklider) that the efficiency of tries would depend on the Shannon-style redundancy of what they were storing, and he ran Monte Carlo simulations to investigate this:

(He explains: “The test program was written in FORTRAN for the IBM 709. The program is composed of 42 subroutines, of which 19 were coded specially for this program and 23 were taken from the library.”)

Tries didn’t make a splash when Ed first introduced them—not least because computers didn’t really have the memory then to make use of them. I think I first heard about them in the late 1970s in connection with spellchecking, and nowadays they’re widely used in lots of text search, bioinformatics and other applications.

Ed had apparently first started talking about tries when he was still in the Air Force. As he explained it to me in 2014:

The Air Force [people] had no idea [what I was talking about]. But I kept on [saying] “I need to find someone who knows something about this that can critique it for me.” And someone says to me, “There’s a guy at MIT who deals in something similar, he calls it lists”. And that was John McCarthy. So, I call up, I get a secretary and, you know, I make a date, and I go to MIT and in building 56 with the computation center, I go to his office and the secretary says he’s somewhere out in the hall. I see some guy wandering back and forth. I go up and say, “You John McCarthy?” He says, “Yes.” So, I say, “I’ve had this idea—” I can’t remember if I was in uniform or not; I might’ve been. I said, “I had this idea, and I’ve written a program and tested it. And might you take a look?” Then he takes this thing, and he starts to read it.

Then he did something that struck me as very weird. He turned around slowly and started walking away, he’s reading and walk, walk, walk, walk, stop. Turns around, walk, walk, walk, walk, back slowly, you know. Finally, he comes back and he stops and he reads and reads. And he’s obviously angry. And I thought, “This is weird.” I said “Does it make sense or anything?” He says, “Yes, it makes sense.” And I said, “Well, what’s up?” He says, “Well, I’ve had the same idea.” And I said, “Oh.” He says, “But I’ve never written it down.” And I said, “Oh, okay. So, do you think I ought to work on it or do something?” He says, “Yeah”. So, that’s how I met John McCarthy.

Ed remained friends with McCarthy for the rest of McCarthy’s life, and involved him in many of his endeavors. In 1956 McCarthy had been one of the organizers of the conference that coined the term “artificial intelligence”, and in 1958 McCarthy began the development of LISP (which was based on linked lists). I have to say I wish I’d known Ed’s story with McCarthy much earlier; I would have handled my own interactions with McCarthy differently—because, as it was, over the course of various encounters from 1981 to 2003 I never persisted very far beyond the curmudgeon stage.

Back around 1958, the circle of “serious computer people” in the Boston area wasn’t very large—and another was Marvin Minsky (who I knew for many years). Between Ed and Licklider, both McCarthy and Minsky became consultants at BBN, and all of them would have many interactions in the years to come.

But in late 1959 there was another entrant in the Boston computer scene: the PDP-1 computer, designed by a certain Ben Gurley for a new company named Digital Equipment Corporation (DEC) that had essentially spun off from Lincoln Lab and MIT. BBN was the first customer for the PDP-1, and Ed was its anchor user:

John McCarthy had had the “theoretical” idea of timesharing, whereby multiple users could work on a single computer. Ed figured out how to make it practical on the PDP-1, in the process inventing what would now be called asynchronous interrupts (then the “sequence break system”). And so began a process which led BBN to become a significant force in computing, the creation of the internet, etc.

But in 1961, Ed and a certain Roland Silver, who also worked at BBN, decided to quit BBN—and, strangely enough, to move to Brazil, where they were enamored of the recently elected new president. But when that new president unexpectedly resigned, they abandoned their plan. And when BBN didn’t want them back, Ed decided to start a company, initially doing consulting for DEC. As Ed tells it, he and Roland Silver were such good friends and had so much they talked about that together they couldn’t get anything done, so they decided they’d better split up.

As I was writing this piece, I decided to look up more about Roland Silver—who I found out had been a college roommate of Marvin Minsky’s at Harvard, and had had a long career in math, etc. at MITRE (the holding company for Lincoln Lab). But I also remembered that many years ago I’d received letters and a rather new-age newsletter from a certain “Rollo Silver”:

Could it be the same person? Yes! And in my archives I also found an ad:

Some time after my work on cellular automata in the 1980s, Roland Silver—together with my longtime friend Rudy Rucker—started a newsletter about cellular automata, notably not mentioning Ed, but including a colorful bio for Silver:

“Triple-I” (III)

But back to Ed and his story. It was 1961, and Ed had quit his job at BBN. In 1957, he’d met on a Cape Cod beach a woman from Western Massachusetts named Dorothy Abair (who was at the time working at a beauty salon)—and six weeks later they’d married, and now had a 3-year-old daughter. Ed had already lined up some consulting with DEC, and as Ed tells it, with a little “hacking” of bank loans, etc. he was able to officially start Information International Incorporated (III)—with a tiny office in Maynard, MA (home of DEC). But then, one day he gets a call from the Woods Hole Oceanographic Institute. He drives down to Woods Hole with a certain Henry Stommel—an oceanography professor at Harvard—who tells him about a “vortex ocean model”, and asks Ed if he can program it on a PDP-1 so that it displays ocean currents on a screen. And the result is that III soon has a contract for $10k (about $100k today) to do this.

I might add a small footnote here. Years later I was talking to Ed about the origins of cellular automata, and he tells me that a certain Henry Stommel had told him that there were cellular automaton models of sand dunes from the 1930s. At the time—before the web—I couldn’t easily track down who Henry Stommel was (and I had no idea how Ed knew him), and to this day I don’t know what those sand dune models might have been.

But in any case, Ed’s interaction with Woods Hole led to what became III’s first major business: digital reading of film. As Ed tells it:

At Woods Hole … they had these meters which would measure how fast the ocean current was going and which way—and recorded it on 16 mm film with little tiny lights and a little fiber optic thing. And they had built a machine to read that film. I looked at the machine and said “That’ll never work”. And they said “Who are you? Of course it’ll work”, and so on, so forth. OK, so some months later they call me up and say it didn’t work.

I have to tell you this but this is insanely funny. So I decide I’m going to make a film reader and here’s how I’m going to do it. I knew there was a 16 mm projector you could rent from a company and you could stop it and then say “Advance one frame” by clicking and it would just advance one frame at a time. So I thought: say I take the lightbulb out and put a photomultiplier in and point it at the screen of the computer. Then light will come from the screen, go through the lens and be focused on the film, and some would go through the film to the photomultiplier and I would be able to tell how much light got through. And we could write a program to do the rest.

That was my idea, OK.

So not having any money, we rented that projector and I got Digital (DEC) to let me use their milling machine and I bought the photomultiplier tube, and I got Ben Gurley to design the circuitry and connect it to the computer. But there was one more thing. The photomultiplier tube was like a vacuum tube but it had like 16 pins and a very odd connector that no one had. But I thought “Lincoln Labs has parts for everything in their electronics warehouse”. So I called someone I used to work with there, and said “Look, do me a favor and sneak into the parts area, take that part and just give it to me. I’ve ordered one but I’m not going to get it for a while and when I get it I’ll give it to you and you can put it back so it’s not actually a theft.” And he said “OK, I’ll do it” but he asked me why I wanted it and I told him “Well, I’m doing this stuff for Woods Hole to read some film with a computer”.

OK, so he gave me the part and we get it going right away and we’re reading the film, and that solved the problem. But meanwhile this very funny thing happened. Someone from Lincoln Labs found out about all this and said “Hey, you’re reading some kind of film. Is that what you used that thing for?” And I said “Yeah”. And they said “Well, we tried to read some films so we built a gadget and did the same thing you did: we pointed it at the screen of the computer, but we can’t make the software work”. And I said “OK, well, come down and tell me about it”. So they come down and what happens is this. There’s some army people and they have a radar that’s looking at a missile coming in and records on film from an oscilloscope. And they asked could we read this. And to make a long story short they signed another contract….

The whole setup was eventually captured in a patent entitled simply “High-Speed Film Reading”:

And actually this wasn’t Ed’s first patent. That had been filed in 1960, while Ed was at BBN—and it was for a mechanical punched card sorter, with arrays of metal pins and the like, and no computer in evidence:

III ended up discovering that there were many applications—military and otherwise—for film readers. But their Woods Hole relationship led in another direction as well: computer graphics and data visualization. By 1963 there were perhaps 300,000 oceanographic stations recording their data on punched cards, and the idea was to take this data and produce from it a “computer-compiled oceanographic atlas”. The result was a paper:

And with statements like “Only a high-speed computer has the capacity and speed to follow the quickly shifting demands and questions of a human mind exploring a large field of numbers” the paper presented visualizations like:

These various developments put III in the center of the emerging field of film-meets-computers systems. The company grew, moving its center of operations to Los Angeles, not least to be near the Systems Development Corporation (SDC) which RAND had spun off as its software arm in response to the SAGE project.

But Ed was always having new ideas for III, and defining new directions. Ed had brought Minsky and McCarthy into III as board members and consultants, and for example in 1964 III was proposing to SDC a project to make a new version of LISP (and, yes, with no obvious film-meets-computers applications). The proposal gives some insight into the state of III at the time. It says that “From a one-man operation [in 1962], I.I.I. has grown to the point where our gross volume of business for 1964 is in the neighborhood of $1 million [about $10 million today]”. It explains that III has four divisions: Mathematical and Programming Services, Behavioral Science, Operations, and “New York”. It goes on to list various things III is doing: (1) LISP; (2) Inductive Inference on Sequences; (3) Computer Time-Sharing; (4) Programmable Film Readers; (5) The World Oceanographic Data Display System; and (6) Computer Display Systems.

It’s certainly an eclectic collection, reflecting, as such things often do, the character of the company’s founder. From a modern perspective, one item that catches one’s attention is:

One can think of it as an early attempt at AI/machine learning—which 60 years later still hasn’t been solved. (GPT-4 says the next letter should be Q, not O.)

But distractions or not, it was a talented team that assembled at III—with lots of cross-fertilization with MIT. III’s business progressively grew, and perhaps it outgrew Ed—and in 1965 Ed stepped down as CEO. In 1968 he left entirely and (as we’ll discuss below) went to MIT, leaving III in the hands of Al Fenaughty, who, years later (and after nearly 30 years at III), would become the chairman of Yandex.

As someone who’s curious about the ways of company founders, I asked Ed many times about his departure from III. He usually just said: “I had a partner who died”. But it’s only now that I’ve pieced together, partly from my 2014 oral history with Ed, what happened. Ed described it to me as the greatest tragedy of his life.

Shortly after he set up III, Ed persuaded Ben Gurley (designer of the PDP-1) to leave DEC and join him at III. I think Ed had hoped to build computers at III, with Gurley as their designer. But on November 7, 1963, in Concord, MA, just a few miles from where I am as I write this, Ben Gurley was murdered—by a single revolver shot through his dining room window as he was about to sit down for dinner with his wife and 7 children. An engineer from DEC (and Lincoln Labs)—about whom Gurley had recently complained to the police—was arrested, and eventually convicted of the crime (after Ed hired a private detective to help). It later turned out that a few years earlier the same engineer was likely also responsible for shooting (though not killing) another engineer from DEC.

I had always assumed that Ed’s decision to leave III happened just after his “partner had died”. But I now realize that Gurley’s death early in the history of III caused III to go on its path of making things like film readers, rather than the DEC- or IBM-challenging computers I think Ed had hoped for.

Even after Ed left active management of III, he was still its chairman. And in late 1968 something would happen that would change his life forever. Taking tech companies public on the “over-the-counter” market had become a thing, and a broker offered to take III public. And on November 26, 1968, III filed its SEC paperwork:

III’s “principal product to date” is described as a “programmable film reader”, but the paperwork notes that as of October 31, 1968, the company has no film readers on order—though there are orders for its new microfilm reader, which it hasn’t delivered yet. It also says that proceeds from the offering will be used to fund its “proposed optical character recognition project”. But for our purposes what’s perhaps more significant is that the paperwork records that Ed owns 57.7% of the company, with the Edward Fredkin Charitable Foundation owning 0.4%.

On January 8, 1969, III went public, and Ed was suddenly, at least on paper, worth more than $10M (or more than $80M today). Two years later (perhaps as soon as a lockup period expired), Ed cashed out, with the SEC notice indicating that Ed would be “repaying personal indebtedness to a bank incurred by him for reasons unrelated to the company or its business” (presumably a loan he’d taken out before he could achieve liquidity):

So now Ed—at age 37—was wealthy. And in fact the money he made from III would basically last the rest of his life, even through a long sequence of subsequent business failures.

III’s OCR project was never a great success, but III became a key company in digital-to-film systems (relevant to both movies and printing), and in the early 1970s created some of the very first computer-generated special effects, that eventually made it into movies like Star Wars. III’s stock price hovered around $10 per share for years, and in 1996—after PostScript had pretty much taken the market for prepress printing systems—III was sold to Autologic for $35M in stock, then in 2001 Autologic was sold to Agfa for $42M.

The Island

When III went public in 1969 it was the height of the Cold War (which probably didn’t hurt III’s military sales). And many people—including Ed—thought World War III might be imminent. And so it was that in 1970 Ed decided to buy an island in the Caribbean, close enough to the tropics, he told me subsequently, that, he assumed (incorrectly according to current models), radioactive fallout from a nuclear war wouldn’t reach it.

Apparently Ed was sitting in a dentist’s office when he saw an “Island for Sale” ad in a newspaper. The seller was a shipwreck-scavenging treasure hunter named Bert Kilbride—sometimes called “the last pirate of the Caribbean”—who had started to develop the island (and for several years would manage it for Ed). It’s a fairly small island (about 125 acres, or 0.2 square miles)—in the British Virgin Islands. And its name is Mosquito Island (or sometimes, with some historical justification, Moskito Island). And when Ed bought it, it probably cost something under $1M. (Richard Branson bought the nearby but smaller Necker Island in 1978.)

I visited Ed’s island in January 1982—the first time I met Ed. And, yes, there was a certain “lair of a Bond villain” (think: Dr. No) vibe to the whole thing. Here are pictures I took from a boat leaving the island (notice the just-visible seaplane parked at the island):

There was a small resort (and restaurant) on the island, named Drake’s Anchorage (built by the previous owner):

And, yes, there were beaches on the island (though I myself have never been much of a beach-goer):

And, in keeping with the Bond vibe, there was a seaplane too:

There was one house on the island, here pictured from the plane (it so happened that when I visited the island, I was learning to fly small planes myself—so I was interested in the plane):

Visiting a nearby island—with its very rundown airport sign—gives some sense of the overall area:

Ed claimed it was difficult to run the resort on his island, not least because, he said, “the British Virgin Islands have the lowest average worker productivity in the world”. But he nevertheless, for example, had a functioning restaurant, and here I am there in 1982, along with Charles Bennett, about whom we’ll hear more later:

When people talked about Ed, his island was often mentioned, and it projected a general image of overall mystique and extreme wealth. In 1983 a movie called WarGames came out, featuring a reclusive military-oriented computer expert named “Professor Falken”—who had an island. Many people assumed Falken was based on Fredkin (and it now says so all over the internet). However, in writing this piece, I decided to find out what was actually true—so I asked one of the writers of the movie, Walter Parkes. He responded, and, yes, fact is often even stranger than fiction:

Unfortunately I can confirm that Ed was not the inspiration for Stephen Falken. The character was inspired by Steven [sic] Hawking. (Falken = Falcon = Hawking) The movie was first conceived to be about two characters, a young super-genius born into a family incapable of acknowledging his gifts, and a dying scientist in need of a protégé. In the first several drafts Falken was confined to a wheel-chair and was working on understanding the big bang, for which he had created a computer simulation. Little known fact—while writing the character, we had one person in mind to play the role: John Lennon, who was murdered shortly before we finished the script.

(By the way, in a moment of “fact follows fiction”, WarGames featured a computer with lots of flashing lights. I happened to see the movie with Danny Hillis, and as we were walking out of the movie, I said to Danny “Perhaps your computer should have flashing lights too”. And indeed flashing lights became a signature feature of Danny’s Connection Machine computer, as later seen in movies like Jurassic Park.)

Project MAC

After he left III in 1968, Ed’s next stop would be MIT, and specifically Project MAC (the “Multiple Access Computer” Project). But actually Ed had already been involved much earlier with Project MAC. In many ways the project was a follow-on to what Ed had been doing at BBN on timesharing.

In 1963 Ed wrote a long survey article on timesharing:

The introduction contains a rather charming window onto the view of computers at the time:

And the ads interspersed through the article give a further sense of the time:

As illustrations of what can be done with an interactive timeshared computer, there’s a picture from Ed’s vortex ocean simulation—as well as an example of an online “book” about LISP:

And, yes, already a kind of “cloud computing” story:

There’s also a description of Project MAC—that had just been funded by the Advanced Research Projects Agency (now DARPA). The article said that the “MAC” stood either for “Multiple Access Computer” or “Machine-Aided Cognition”. It included various sections on what might be possible with timesharing:

The main text of the article ends with a rousing (?) vision of AI taking over from humans (and, yes, even though this is from 60 years ago it’s not so different from what at least some people might say about the “AI future” today):

But there’s a curious piece of backstory to Project MAC—from 1961—that appears as a footnote to Ed’s article:

Ed told me versions of this story many times. McCarthy had failed to get tenure at MIT, and was looking for another job. (Yes, in retrospect this seems remarkable given all the things he’d already done by then. But those things were computer science—and MIT didn’t yet have a CS department; McCarthy was in the EE department.) Ed, Minsky and McCarthy were going to an SDC meeting in Los Angeles, and while he was out there McCarthy was going to interview at Caltech (his undergraduate alma mater). They had a free evening, and Ed suggested they meet “someone interesting”. Ed remembered Linus Pauling from his time at Caltech. But Pauling wasn’t in. So Minsky suggested they call Richard Feynman. And he was in, and invited them over to his house.

Feynman apparently showed them things like his nanotech-inspiring tiny motor, etc., but somehow the discussion shifted to AI. And Minsky mentioned work a student of his was doing on the “AI problem” of symbolic integration. Then McCarthy started to explain ways a computer could do algebra. Then, as Ed told it to me in 2014:

Feynman produces this sheaf of papers to show us. It was all algebra. And he says “There’s a problem. I’ve done this calculation, and it’s close to 50 pages. A graduate student has done it too, and Murray Gell-Mann has done it. And the only thing we know for sure is that our three results are mutually inconsistent. And the only conclusion we can arrive at is that a person can’t do this much algebra with the hope of getting it right.” And so the question was could there be some system that could help do a problem like that? So what happened is Marvin [Minsky] and I basically fleshed out the idea of a mathematical thing. And it was agreed that we would do it. Marvin and I decided to divide this task up, that I would do one part, and he would do another. Now, we had one bad idea in there, OK. It’s partly Feynman’s fault, but it’s also Marvin and my fault. He was convinced you could not do [math] by typing it. It had to have some kind of handwriting recognition. So, it was decided I would do the handwriting recognition…

And although I didn’t know this until I was writing this piece, it turns out the original proposal for Project MAC was actually based on the idea of building a system for mathematics, and “Project MAC” was originally the “Project on Mathematics and Computation”. Pretty soon, though, the emphasis of Project MAC would shift to the “infrastructure” of timeshared computing. But there was still a math effort, which in time became the MACSYMA system for computer algebra (written in LISP by students and grandstudents of Minsky).

And here this intersects with my personal story. Because many years later (starting in 1976) I would use that system—along with other early computer algebra systems—to do all sorts of physics calculations. My archives still contain an example of what it was like in 1980 to log in to “Project MAC” over the ARPANET (my username was “swolf” in those days; note the system message, the presence of 15 MITishly-named “lusers” altogether, and yes, mail):

But, actually, in late 1979 I had already decided to “do my own thing” and build my own system for doing mathematical computation, and eventually much more. And indeed when I first met Ed in 1982 I had recently finished the first version of SMP, and to commercialize it I had started my first company. In 1986 I started to build Mathematica (and what’s now Wolfram Language)—which was released in 1988. Ed started using Mathematica very soon after it was released, and basically continued to do so for the rest of his life.

But picking up the original Project MAC narrative from 1963: the old group from BBN had dispersed but were still writing together about timesharing (and when they said a “debugging system” they meant essentially what we would now call an operating system):

And when Project MAC launched in 1963, its “steering committee” included Minsky, Gurley—and Ed. (John McCarthy had landed at Stanford, where he would remain for the rest of his life. I first met him in 1981, at a time when Stanford was trying to recruit me. There was a lunch with the CS department; people went around the room and introduced themselves. McCarthy unhelpfully—and confusingly—said he was “John Smith”.)

Ed at MIT

In 1968, Ed left III—and Minsky, together with Licklider (who had by then become director of Project MAC), persuaded the MIT EE department to hire Ed as a visiting professor for the year. Ed had been spending most of his time at III in Los Angeles, but III also had a pied-à-terre in the Boston area, and indeed its IPO documents listed its address as 545 Technology Square, Cambridge—the very building in which Project MAC was located.

At MIT, Ed invented and taught a freshman course on “Problem Solving”. He told me many times one of his favorite “problem exercises”. Imagine there’s a person who can cure anyone who’s sick just by touching them. How could one set things up to make the best use of this? I must say I never find such implausible hypotheticals terribly interesting. But Ed was proud of a solution that he’d come up with (I think in discussion with Minsky and McCarthy) that involved systematically shuttling millions of people past the healer.

This probably didn’t come from that particular course, but here are some notes I found in an archive of Ed’s papers at MIT that perhaps suggest some of the flavor of the course (we’ll talk about Ed’s interest in the Soviet Union later):

In 1968 MIT—and Project MAC in particular—was at the very center of emerging ideas about computer science and AI. A picture from that time captures Ed (third from left) with a few of the people involved: Claude Shannon, John McCarthy and Joe Weizenbaum (creator of ELIZA, the original chatbot):

At the end of the 1968 academic year student reviews from Ed’s course were unexpectedly good, and MIT needed faculty members who could be principal investigators on the government grants that were becoming plentiful for computing—and one of those typical-for-Ed “surprising things” happened: MIT agreed to hire him as a full professor with tenure, despite his lack of academic qualifications. It was a watershed moment for Ed, and I think a piece of validation that he carried with pride for the rest of his life. (For what it’s worth, while Ed was an extreme case, MIT was at that time also hiring at least some other people without the usual PhD qualifications into CS professor positions.)

In 1971 Licklider stepped down from his position as director of Project MAC—and Ed assumed the position. His archives from the time contain lots of administrative material—studies, reports, proposals, budgets, etc.—including many pieces reflecting things like the birth of the ARPANET, the maturing of operating systems and the general enthusiasm about the promise of AI.

One item (conceivably from an earlier time) is Ed’s summary of “Information Processing Terminology” for PDP-1 users, complete with definitions like: “A bit is a binary digit or any thing or state that represents a binary digit. Equivalently, a bit is a set with exactly two members. Note that a bit is not one of the members of such a set”:

Ed does not seem to have been very central to the intellectual activities around Project MAC, and the emerging Lab for Computer Science and AI Lab. But his name shows up from time to time. And, for example, in the classic “HAKMEM” collection of 191 math and CS “hacks” from the AI Lab, there are two—both very number oriented—attributed to Ed:

Rollo Silver gets mentioned too—notably in connection with “random number generators” involving XORs (and, yes, the code is assembly code—for a PDP-10):

Also in HAKMEM is the “munching squares” algorithm—that I was later shown by Bill Gosper:

And talking of Gosper (whom I’ve known since 1979, and who almost every week seems to send me mail with a surprising new piece of math he’s found with Mathematica): in 1970 the Game of Life cellular automaton had come on the scene, and Gosper and others at MIT were intensely studying it, with Gosper triumphantly discovering the glider gun in November 1970. Curiously—in view of all his emphasis on cellular automata—Ed doesn’t seem to have been involved.

But he did do other things. In 1972, for example, as a kind of spinoff from his Problem Solving course, he formed a group called “The Army to End the War” (i.e. the Vietnam War), whose idea was that it was time to stop the government fighting an unwinnable war, and this could be achieved by having an organization that would coordinate citizens to threaten a run on banks unless the war was ended. Needless to say, though, this didn’t really fit well with the project Ed ran being funded by the Department of Defense.

Between MIT being what it is, and Ed being who he was, there were often strange things that happened. As Ed tells it, one day he was in Marvin Minsky’s office talking about unrecognized geniuses, and a certain Patrick Gunkel walks in, and identifies himself as such. Ed ended up having a long association with Gunkel, who produced such documents as:

(Gunkel’s major goal was to create what he called “ideonomy”, or the “science of ideas”, with divisions like isology, chorology, morology and crinology. I met Gunkel once, in Woods Hole, where he had become something of a local fixture, riding around town with his cat in his bicycle basket.)

But after a few years as director of Project MAC, in 1974 Ed was onto something new: being a visiting scholar at Caltech. After his 1961 encounter, he had gotten to know Richard Feynman—who always enjoyed spending time with “out of the box” people like Ed. And so in 1974 Ed went for a year to Caltech, to be with Feynman.

The Universe as a Cellular Automaton

My own efforts (and successes) with cellular automata may perhaps have had something to do with it. But I think at least in the later part of his life, Ed felt his greatest achievements related to cellular automata and in particular his idea that the universe is a giant cellular automaton. I’m not sure when Ed really first hatched this idea, or indeed started to think about cellular automata. Ed had told me many times that when he’d told John McCarthy “the idea”, McCarthy suggested testing it by looking for “roundoff error” in physics, analogous to roundoff error from finite precision in computers. Ed scoffed at this, accusing McCarthy of imagining that there was literally “an IBM 709 computer in the sky”. And Ed’s implication was that he had gotten further than that, imagining the universe to be made more abstractly from a cellular automaton.

I didn’t know quite when this exchange with McCarthy was supposed to have taken place (and, by the way, some of the emerging experimental implications of our Physics Project are precisely about finding evidence of discrete space through something quite analogous to “roundoff errors” in the equations for spacetime). But Ed’s implication to me was always that he’d started exploring cellular automata sometime before 1960.

In the mid-1990s, researching history for my book A New Kind of Science, (as I’ll discuss below) I had a detailed email exchange and long phone conversation with Ed about this. The result was a statement in my notes about the history of cellular automata:

At the time, Ed made it sound very convincing. But in writing this piece, I’ve come to the conclusion it’s almost certainly not correct. And of course that’s disappointing given all the effort I put into the history notes in my book, and the almost complete lack of other errors that have surfaced even after two decades of scrutiny. But in any case, it’s interesting to trace the actual development of Ed’s ideas.

One useful piece of evidence is a 25-page document from 1969 in his archives, entitled “Thinking about New Things”—that seems to outline Ed’s thinking at the time. Ed explains “I am not a Physicist, in fact I know very little about modern physics”—but says he wants to suggest a new way of thinking about physics:

Soon he starts talking about the possibility that the universe is “merely a simulation on a giant computer”, and relates a version of what he told me about his interaction with John McCarthy:

He talks (in a rather programmer kind of way) about the beginning of the universe:

He goes on—again in a charmingly “programmer” way:

A bit later, Ed is beginning to get to the concept of cellular automata:

And there we have it: Ed gets to (3D) cellular automata, though he calls them “spatial automata”:

And now he claims that spatial automata can exhibit “very complex behavior”—although his meaning of that will turn out to be a pale shadow of what I discovered in the early 1980s with things like rule 30:

But at this point Ed already seems to think he’s almost there—that he’s almost reproduced physics:

A little later he’s discussing doing something very much in my style: enumerating possible rules:

And still further on he actually talks about 1D rules. And in some sense it might seem like he’s getting very close to what I did in the early 1980s. But his approach is very different. He’s not doing “science” and “empirically seeing what cellular automata do”. Or even being very interested in cellular automata for their own sake. Instead, he’s trying to engineer cellular automata that can “be the universe”. And so for example he wants to consider only left-right symmetric cellular automata “because the universe is isotropic”. And having also decided he wants cellular automata that are symmetric under interchange of black and white (a property he calls “syntactic symmetry”), he ends up with just 8 rules. He could just have simulated these by running them on a computer. But instead he tries to “prove” by pure thought what the rules will do—and comes up with this table:

Had he done simulations he might have made pictures like these (labeled using my rule-numbering scheme):

But as it was he didn’t really come to any particular conclusion, other than what amount to a few simple “theorems” about what “data processing” these cellular automata can do:

I must say I find it very odd that—particularly given all the stories about his activities and achievements he told me—Ed never in the four decades I knew him mentioned anything about having thought about 1D cellular automata. Perhaps he didn’t remember, or perhaps—even after everything I wrote about them—he never really knew that I was studying 1D cellular automata.

But in any case, what comes next in the 1969 document is Ed getting back to “pure thought” arguments about how cellular automata might “make physics”:

It’s a bit muddled (though, to be fair, this was a document Ed never published), but at the end it’s basically saying that if the universe really is just a cellular automaton then one should be able to replace physical experiments (that would, for example, need particle accelerators) with “digital hardware” that just runs the cellular automaton. The next section is entitled “The Design of a Simulator”, and discusses how such hardware could be constructed, concluding that a 1000×1000×1000 3D grid of cells could be built for $50M (or nearly half a billion dollars today).

After that, there’s one final (perhaps unfinished) section that reads a bit like a caricature of “I’ve-got-a-theory-of-physics-too” mechanical models of physics:

But, OK, so what does this all mean? Well, first, I think it makes it rather clear that (despite what he told me) by 1969—let alone 1961—Ed hadn’t actually implemented or run cellular automata in any serious way. It’s also notable that in this 1969 piece Ed isn’t using the term “cellular automaton”. The concept of cellular automata had been invented many times, under many different names. But by 1969 the term “cellular automaton” was pretty firmly established, and in fact 1969 might have represented the very peak up to that point of interest in cellular automata in the world at large. But somehow Ed didn’t know about this—or at least wasn’t choosing to connect with it.

Even at MIT Frederick Hennie in the EE department had actually been studying cellular automata—albeit under the name “iterative arrays”—since the very beginning of the 1960s. In 1968 E. F. Codd from IBM (who laid the foundations for SQL—and who worked with Ed’s friend John Cocke) had published a book entitled Cellular Automata. Alvy Ray Smith—in the same department as John McCarthy at Stanford—was writing his PhD thesis on “cellular automata”. In 1969 Marvin Minsky and Seymour Papert published their Perceptrons book, and were apparently talking a lot about cellular automata. And for example by the fall of 1969 Papert’s student Terry Beyer had written a thesis about the “recognition and transformation of figures by iterative arrays of finite state automata”—under the auspices of Project MAC, presumably right under Ed’s nose. (And, no, the thesis doesn’t mention Ed, though it mentions Minsky.)

Right around that time, though, something happens. Ed had been convinced—probably by Minsky and McCarthy—that any cellular automaton capable of “being the universe” better be computation universal. And now there’s a student named Roger Banks who’s working on seeing what kind of (2D) cellular automaton would be needed to get computation universality. Banks had found examples requiring much fewer than the 29 states von Neumann and Burks had used in the 1950s. But—as he related to me many times—Ed challenged Banks to find a 2-state example (“implementable purely with logic gates”), and Banks soon found it, first describing it in June 1970:

Banks had apparently been interacting with the “Life hackers” at MIT, and in November 1970 some of the thunder of his result was stolen when Bill Gosper at MIT discovered the glider gun, which suggested that even the rules of the Game of Life (albeit involving 9 rather than 5 2D neighbors) were likely to be sufficient for computation universality.

But for our efforts to trace history, Banks’s June 1970 report has a number of interesting elements. It relates the history of cellular automata, without any mention of Ed. But then—in its one mention of Ed—it says:

The “mod-2 rule” that Ed told me he’d simulated in 1961 has finally made an appearance. In an oral history years later Terry Winograd reported that in 1970 he “went to a lecture of Papert’s in which he described a conjecture about cellular automata [which Winograd] came back with a proof of”.

By January 1971, Banks is finishing his thesis, which is now officially supervised by Ed (even though it’s nominally in the mechanical engineering department):

Most of Banks’s work is presented as what amount to “engineering drawings”, but he mentions that he has done some simulations. I don’t know if these included simulations of the mod-2 rule but it seems likely.

So was 1969 or 1970 the first time the mod-2 rule had been heard from? I’m not sure, but I suspect so. But to confuse things there’s a “display hack” known as “munching squares” (described in HAKMEM) that looks in some ways similar, and that was probably already seen in 1962 on the PDP-1. Here are the frames in a small example of munching squares:

Here’s a video of a bigger example:

I expect Ed saw munching squares, perhaps even in 1962. But it’s not the mod-2 rule—or actually a cellular automaton at all. And even though Ed certainly had the capability to simulate cellular automata back at the beginning of the 1960s (and could even have recorded videos of 2D ones with III’s film technology) the evidence we have so far is that he didn’t. And in fact my suspicion is that it was probably only around the time I met Ed in 1982 when it finally happened.

My First Encounter with Ed

In May 1981 there’d been a conference at MIT on the Physics of Computation. I’d been invited, but in the end I couldn’t go—because (in a pattern that has repeated many times in my life) it coincided with the initial release of my SMP software system. Still, in December 1981 I got the following invitation:

In January 1982 I was planning to go to England to do a few weeks of intensive SMP development on a computer that a friend’s startup had—and I figured I would go to the Caribbean “on the way”.

It was an interesting group that assembled on January 18, 1982, on Mosquito Island. It was the first time I met my now-longtime friend Greg Chaitin. There were physicists there, like Ken Wilson and David Finkelstein. (Despite the promise of the invitation, Feynman’s health prevented him from coming.) And then there were people who’d worked on reversible computation, like Rolf Landauer and Charles Bennett. There were Tom Toffoli and Norm Margolus, who had their cellular automaton machine with them. And finally there was Ed. At first he seemed a little Gatsby-like, watching and listening, but not saying much. I think it was the next morning that Ed pulled me aside rather conspiratorially and said I should come and see something.

There was just one real house (as opposed to cabin) on the island (with enough marble to clinch the Bond-villain-lair vibe). Ed led me to a narrow room in the house—where there was a rather-out-of-place-for-a-tropical-island modern workstation computer. I’d seen workstation computers before; in fact, the company I’d started was at the time (foolishly) thinking of building one. But the computer Ed had was from a company he was CEOing. It was a PERQ 1, made by Three Rivers Computer Corporation, which had been founded by a group from CMU including McCarthy’s former student Raj Reddy. I learned that Three Rivers was a company in trouble, and that Ed had recently jumped in to save it. I also learned that in addition to any other challenges the engineers there might have had, he’d added the requirement that the PERQ be able to successfully operate on a tropical island with almost 100% humidity.

But in any case, Ed wanted to show me something on the screen. And here’s basically what it was:

Ed pressed a button and now this is what happened:

I’d seen plenty of “display hacks” before. Bill Gosper had shown me ones at Xerox PARC back in 1979, and my archives even contain some of the early color laser printer outputs he gave me:

I don’t remember the details of what Ed said. And what I saw looked like “display hacks flashing on the screen”. But Ed also mentioned the more science-oriented idea of reversibility. And I’m pretty sure he mentioned the term “cellular automaton”. It wasn’t a long conversation. And I remember that at the end I said I’d like to understand better what he was showing me.

And so it was that Ed handed me a PERQ 8” floppy disk. And now, 41 years later, here it is, sitting— still unread—in my archives:

It’s not so easy these days to read something like this—and I’m not even sure it will have “magnetically survived”. But fortunately—along with the floppy—there’s something else Ed gave me that day. Two copies of a 9-page printout, presumably of what’s on the floppy:

And what’s there is basically a Pascal program (and the PERQ was a very Pascal-oriented machine; “PERQ” is said to have stood for “Pascal Engine that Runs Quicker”). But what does the program do? The main program is called “CA1”, suggesting that, yes, it was supposed to do something with cellular automata.

There are a few comments:

And there’s code for making help text:

Apparently you press “b” to “clear the Celluar [sic] Automata boundary”, “n” for “Fredkin’s Pattern” and “p” for “EF1”. And at the end there’s a reference to munching squares. The first pattern above is what you get by pressing “n”; the second by pressing “p”.

Both patterns look pretty messy. But if instead you press “a”, you get something with a lot more structure:

I think Ed showed this to me in passing. But he was more interested in the more complicated patterns, and in the fact that you could get them to reverse what they were doing. And in this animated form, I suspect this just looked to me like another munching squares kind of thing.

But, OK, given that we have the program, can we tell what it actually does? The core of it is a bunch of calls to the function rasterop(). Functions like rasterop() were common in computers with bitmapped displays. Their purpose was to apply a certain Boolean operation to the array of black and white pixels in a region of the screen. Here it’s always rasterop(6, …) which means that the function being applied is Boolean function 6, or Xor (or “sum mod 2”).

And what’s happening is that chunks of the screen are getting Xor’ed together: specifically, chunks that are offset by one pixel in each of the four directions. And this is all happening in two phases, swapping between different halves of the framebuffer. Here are the central parts of the sequence of frames that get generated starting from a single cell:

It helps a lot to see the separate frames explicitly. And, yes, it’s a cellular automaton. In fact, it’s exactly the “reversible mod-2 rule”. Here it is for a few more steps, with its simple “self-reproduction” increasingly evident:

Back in 1982 I think I only saw the PERQ that one time. But in one of the resort cabins on the other side of the island—there was this (as captured in a slightly blurry photograph that I took):

It was a “cellular automaton machine” built out of “raw electronics” by Tom Toffoli and Norm Margolus—who were the core of Ed’s “Information Mechanics” group at MIT. It didn’t feel much like science, but more like a video DJ performance. Patterns flashing and dancing on the screen. Constant rewiring to produce new effects. I wanted to slow it all down and “sciencify” it. But Tom and Norm always wanted to show yet another strange thing they’d found.

Looking in my archives today, I find just one other photograph I took of the machine. I think I considered this the most striking pattern I saw the machine produce. And, yes, presumably it’s a 2D cellular automaton—though despite my decades of experience with cellular automata I don’t today immediately recognize it:

What did I make of Ed back in 1982? Remember, those were days long before the web, and before one could readily look up people’s backgrounds. So pretty much all I knew was that Ed was connected to MIT, and that he owned the island. And I had the impression that he was some kind of technology magnate (and, yes, the island and the plane helped). But it was all quite mysterious. Ed didn’t engage much in technical conversations. He would make statements that were more like pronouncements—that sounded interesting, but were too vague and general for me to do much more than make up my own interpretations for them. Sometimes I would try to ask for clarification, but the response was usually not an explanation, but instead a tangentially related—though often rather engaging—story.

All these years later, though, one particular exchange stands out in my memory. It was at the end of the conference. We were standing around in the little restaurant on the island, waiting for a boat to arrive. And Ed said out of the blue: “I’ll make a deal with you. You teach me how to write a paper and I’ll teach you how to build a company.” At the time, this struck me as quite odd. After all, writing papers seemed easy to me, and I assumed Ed was doing it if he wanted to. And I’d already successfully started a company the previous year, and didn’t think I particularly needed help with it. (Though, yes, I made plenty of mistakes with that company.) But that one comment from Ed somehow for years cemented my view of him as a business tycoon who didn’t quite “get” science, though had ideas about it and wanted to dabble in it.

Ed and Feynman

Ed would later describe Richard Feynman as his best friend. As we discussed above, they’d first met in 1961, and in 1974 Ed had spent the year at Caltech visiting Feynman, having, as Ed tells it, made a deal (analogous to the one he later proposed to me) that he would teach Feynman about computers, and Feynman would teach him about physics. I myself first got to know Feynman in 1978, and interacted extensively with him not only about physics, but also about symbolic computing—and cellular automata. And in retrospect I have to say I’m quite surprised that he mentioned Ed to me only a few times in passing, and never in detail.

But I think the point was that Feynman and Ed were—more than anything else—personal friends. Feynman tended to find “traditional academics” quite dull, and much preferred to hang out with more “unusual” people—like Ed. Quite often the people Feynman hung out with had quite kooky ideas about things, and I think he was always a little embarrassed by this, even though he often seemed to find it fun to indulge and explore those ideas.

Feynman always liked solving problems, and applying himself to different kinds of areas. But I have to say that even I was a little surprised when in writing this piece I was going through the archives of Ed’s papers at MIT, and found the following letter from Feynman to Ed:

Clearly he—like me—viewed Ed as an authority on business. But what on earth was this “cutting machine”, and why was Feynman trying to sell it?

For what it’s worth, the next couple of pages tell the story:

Feynman’s next-door neighbor had a company that made swimwear, and this was a machine for cutting the necessary fabric—and Feynman had helped develop it. And much as Feynman had been prepared to help his neighbor with this, he was also prepared to help Ed with some of his ideas about physics. And in the archive of Ed’s papers, there’s a letter from Feynman:

I don’t know whether this is the first place the term “Fredkin gate” was ever used. But what’s here is a quintessential example of Feynman diving into some new subject, doing detailed calculations (by hand) and getting a useful answer—in this case about what would become Ed’s best-known invention: reversible logic, and the Fredkin gate.

Feynman had always been interested in “computing”. And indeed when he was recruited to the Manhattan Project it was to run a team of human computers (equipped with mechanical desk calculators). I think Feynman always hoped that physics would “become computational” at least in some sense—and he would for example lament to me that Feynman diagrams were such a bad way to compute things. Feynman always liked the methodology of traditional continuous mathematics, but (as I just noticed) even in 1964 he was saying that “I believe that the theory that space is continuous is wrong, because we get these infinities and other difficulties…”. And elsewhere in his 1964 lectures that became The Character of Physical Law Feynman says:

Did Feynman say these things because of his conversations with Ed? I rather doubt it. But as I was writing this piece I learned that Ed thought differently. As he told it:

I never pressed any issue that would sort of give me credit, okay? It’s just my nature. A very weird thing happened toward the end of my time at Caltech. Richard Feynman and I would get into very fierce arguments. . . . I’m trying to convince him of my ideas, that at the bottom is something finite and so on. He suddenly says to me, “You know, I’m sure I had this same idea sometime quite a while ago, but I don’t remember where or how or whether I ever wrote it down.” I said, “I know what you’re talking about. It’s a set of lectures you gave someplace. In those lectures you said perhaps the world is finite.” He just has this little statement in this book. I saw the book on his shelf. I got it out, and he was so happy to see that there. What I didn’t tell him was he gave that lecture years after I’d been haranguing him on this subject. I knew he thought it was his idea, and I left it that way. That was just my nature.

Notwithstanding what he said, I rather suspect he did push the point. And for example when Feynman gave a talk on “Simulating Physics with Computers” at the 1981 MIT Physics of Computation conference that Ed co-organized, he was careful to write that:

Ed, by the way, arranged for Feynman to get his first personal computer: a Commodore PET. I don’t think Feynman ended up using it terribly much, though in 1984 he took it with him on a trip to Hawaii where he and his son Carl used it to work out probabilities to try to “crack” the randomness of my rule 30 cellular automaton (needless to say, without success).

Digital Physics & Reversible Logic

Back at MIT in 1975 after his year at Caltech, Ed was no longer the director of Project MAC, but was still on the books as a professor, albeit something of an outcast one. Soon, though, he was teaching a class about his ideas—under the title of “Digital Physics”:

Cellular automata weren’t specifically mentioned in the course description—though in the syllabus they were there, with the Game of Life as a key example:

Back in the 1960s, cellular automata had been a popular topic in theoretical computer science. But by the mid-1970s the emphasis of the field had switched to things like computational complexity theory—and, as Ed told me many times, his efforts to interest people at MIT in cellular automata failed, with influential CS professor Albert Meyer (whose advisor Patrick Fischer had worked quite extensively on cellular automata) apparently telling Ed that “one can tell someone is out of it if they don’t think cellular automata are dead”. (It’s an amusing irony that around this time, Meyer’s future wife Irene Greif would point John Moussouris—who we’ll meet later—to Ed and his work on cellular automata.)

Ed’s ideas about physics were not well received by the physicists at MIT. And for example when students from Ed’s class asked the well-known MIT physics professor Philip Morrison what he thought of Ed’s approach, he apparently responded that “Of course Fredkin thinks the universe is a computer—he’s a computer person; if instead he were a cheese merchant he’d think it was a big cheese!”

When Ed was at Caltech in 1974 a big focus there—led by Carver Mead—was VLSI design. And this led to increasing interest in the ultimate limits on computation imposed by physics. Ever since von Neumann in the 1950s it had been assumed that every step in a computation would necessarily require dissipation of energy—and this was something Carver Mead took as a given. But if this was true, how could Ed’s cellular automaton for the universe work? Somehow, Ed reasoned, it—and any computation, for that matter—had to be able to run reversibly, without dissipating any energy. And this is what led Ed to his most notable scientific contribution: the idea of reversible logic.

Ordinary logic operations—like And and Or—take two bits of input and give one bit of output. And this means they can’t be reversible: with only one bit in the output there isn’t information to uniquely determine the two bits of input from the output. But if—like Ed—you consider a generalized logic operation that for example has both two inputs and two outputs, then this can be invertible, i.e. reversible.

The concept of an invertible mapping had long existed in mathematics, and under the name “automorphisms of the shift” had even been studied back in the 1950s for the case of what amounted to 1D cellular automata (for applications in cryptography). And in 1973 Charles Bennett had shown that one could make a reversible analog of a Turing machine. But what Ed realized is that it’s possible to make something like a typical computer design—and have it be reversible, by building it out of reversible logic elements.

Looking through the archive of Ed’s papers at MIT, I found what seem to be notes on the beginning of this idea:

And I also found this—which I immediately recognized as a sorting network, in which values get sorted through a sequence of binary comparisons:

Sorting networks are inevitably reversible. And this particular sorting network I recognized as the largest guaranteed-optimal sorting network that’s known—discovered by Milton Green at SRI (then “Stanford Research Institute”) in 1969. It’s implausible that Ed independently discovered this exact same network, but it’s interesting that he was drawing it (by hand) on a piece of paper.

Ed’s archives also contain a 3-page draft entitled “Conservative Logic”:

Ed explains that he is limiting himself to gates that implement permutations

and then goes on to construct a “symmetric-majority-parity” gate—which he claims is “computation universal”:

It’s not quite a Fredkin gate, but it’s close. And, by the way, it’s worth pointing out that these gates alone aren’t “computation universal” in something like the Turing sense. Rather, the point is that—like with Nand for ordinary logic—any reversible logic operation (i.e. permutation) with any number of inputs can be constructed using just these gates, connected by wires.

Ed didn’t at first publish anything about his reversible logic idea, though he talked about it in his class, and in 1978 there were already students writing term papers about it. But then in 1978, as Ed told it later:

I found this guy Tommaso Toffoli. He had written a paper that showed how you could build a reversible computer by storing everything that an ordinary computer would have to forget. I had figured out how to have a reversible computer that didn’t store anything because all the fundamental activity was reversible. Okay? So I decided to hire him because he was the only person who tried to do it and he didn’t succeed, really, and I had—and I hired him to help me.

Toffoli had done a first PhD in Italy building electronics for cosmic ray detectors, and in 1978 he’d just finished a second PhD, working on 2D cellular automata with Art Burks (who had coined the name “cellular automaton”). Ed brought Toffoli to MIT under a grant to build a cellular automaton machine—leading to the machine I saw on Ed’s island in 1982. But Ed also worked with Toffoli to write a paper about conservative logic—which finally appeared in 1982, and contained both the Fredkin gate, and the Toffoli gate. (Ed later griped to me that Toffoli “really hadn’t done much” for the paper—and that after all the Toffoli gate was just a special case of the Fredkin gate.)

Back in 1980—on the way to this paper—Ed, with Feynman’s encouragement, had had another idea: to imagine implementing reversible logic not just abstractly, but through an explicit physical process, namely collisions between elastic billiard balls. And as we saw above, Feynman quickly got into analyzing this, for example seeing how a Fredkin gate could be implemented just with billiard balls.

But ultimately Ed wanted to implement reversibility not just for things like circuits, but also—imitating the reversibility that he believed was fundamental to physics—for cellular automata. Now the fact is that reversibility for cellular automata had actually been quite well studied since the 1950s. But I don’t think Ed knew that—and so he invented his own way to “get reversibility” in cellular automata.

It came from something Ed had seen on the PDP-1 back in 1961. As Ed tells it, in playing around with the PDP-1 he had come up with a piece of code that surprised him by drawing something close to a circle in pixels on the screen. Minsky had apparently “gone into the debugger” to see how it worked—and in 1972 HAKMEM attributed the algorithm to Minsky (though in the Pascal program I got from Ed in 1982, it appears as a function called efpattern()). Here’s a version of the algorithm:

And, yes, with different divisors d it can give rather different (and sometimes wild) results:

But for our purposes here what’s important is that Ed found out that this algorithm is reversible—and he realized that in some sense the reason is that it’s based on a second-order recurrence. And, once again, the basic ideas here are well known in math (cf. reversibility of the wave equation, which is second order). But Ed had a more computational version: a second-order cellular automaton in which one adds mod 2 the value of a cell two steps back. And I think in 1982 Ed was already talking about this “mod-2 trick”—and perhaps the PERQ program was intended to implement it (though it didn’t).

Ed’s work on reversible logic and “digital physics” in a sense came to a climax with the 1981 Physics of Computation conference at MIT—that brought in quite a Who’s Who of people who’d been interested in related topics (as I mentioned above, I wasn’t there because of a clash with the release of SMP Version 1.0, though I did meet or at least correspond with most of the attendees at one time or another):

Originally Ed wanted to call the conference “Physics and Computation”. But Feynman objected, and the conference was renamed. In the end, though, Feynman gave a talk entitled “Simulating Physics with Computers”—which most notably talked about the relation between quantum mechanics and computation, and is often seen as a key impetus for the development of quantum computing. (As a small footnote to history, I worked with Feynman quite a bit on the possibility of both quantum computing and quantum randomness generation, and I think we were both convinced that the process of measurement was ultimately going to get in the way—something that with our Physics Project we are finally now beginning to be able to analyze in much more detail.)

But despite his interactions with Feynman, Ed was never too much into the usual ideas of quantum mechanics, hoping (as he said in the flyer for his course on digital physics) that perhaps quantum mechanics would somehow fall out of a classical cellular-automaton-based universe. But when quantum computing finally became popular in the 1990s, reversible logic was a necessary feature, and the Fredkin gate (also known as CSWAP or “controlled-swap”) became famous. (The Toffoli gate—or CCNOT—is a bit more famous, though.)

In tracing the development of Ed’s ideas, particularly about “digital physics”, there’s another event worthy of mention. In late 1969 Ed learned about an older German tech entrepreneur named Konrad Zuse who’d published an article in 1967 (and a book in 1969) on Rechnender Raum (Calculating Space)—mentioning the term “cellular automata”:

Although Zuse was 24 years older than Ed, there were definitely similarities between them. Zuse had been very early to computers, apparently building one during World War II that suffered an air raid (and may yet still lie buried in Berlin). After the war, Zuse started a series of computer companies—and had ideas about many things. He’d been trained as an engineer, and perhaps it was having worked on solving his share of PDEs using finite differences that led him to the idea—a bit like Ed’s—that space might fundamentally be a discrete grid. But unlike Ed, Zuse for the most part seemed to think that—as with finite differences—the values on the grid should be continuous, or at least integers. Ed arranged for Zuse’s book to be translated into English, and for Zuse to visit MIT. I don’t know how much influence Zuse had on Ed, and when Ed talked to me about Zuse it was mostly just to say that people had treated his ideas—like Ed’s—as rather kooky. (I exchanged letters with Zuse in the 1980s and 1990s; he seemed to find my work on cellular automata interesting.)

Ideas & Inventions Galore

It wasn’t just physics that Ed had ideas about. It was lots of other things too. Sometimes the ideas would turn into businesses; more often they’d just stay as ideas. Ed’s archive, for example, contains a document on the “Intermon Idea” that Ed hoped would “provide a permanent solution to the world’s problem of not having a stable medium of exchange”:

And, no, Ed wasn’t Satoshi Nakamoto—though he did tell me several times that (although, to his displeasure, it was never acknowledged) he had suggested to Ron Rivest (the “R” of RSA cryptography) the idea of “using factoring as a trapdoor”. And—not content with solving the financial problems of the world, or, for that matter, fundamental physics—Ed also had his “algorithmic plan” to prevent the possibility of World War III.

And then there was the Muse. Marvin Minsky had long been involved with music, and had assembled out of electronic modules a system that generated sequences of musical notes. But in 1970 Ed and Minsky developed what they called the Muse—whose idea was to be a streamlined system that would use integrated circuits to “automatically compose music”:

In actuality, the Muse produced sequences of notes determined by a linear feedback shift register—in essence a 1D additive cellular automaton—in which the details of the rule were set on its front panel as “themes”. The results were interesting—if rather R2-D2-like—but weren’t what people usually thought of as “music”. Ed and Minsky started a company named Triadex (note the triangular shape of the Muse), and manufactured a few hundred Muses. But the venture was not a commercial success.

Particularly through interacting with Minsky, Ed was quite involved in “things that should be possible with AI”. The Muse had been about music. But Ed also for example thought about chess—where he wanted to build an array of circuits that could tree out possible moves. Working with Richard Greenblatt (who had developed an earlier chess machine) my longtime friend John Moussouris ended up designing CHEOPS (a “Chess-Oriented Processing System”) while Ed was away at Caltech. (Soon thereafter, curiously enough, Moussouris would go to Oxford and work with Roger Penrose on discrete spacetime—in the form of spin networks. Then in later years he would found two important Silicon Valley microprocessor companies.)

Keeping on the chess theme, Ed would in 1980 (through his Fredkin Foundation) put up the Fredkin Prize for the first computer to beat a world champion at chess. The first “pre-prize” of $5k was awarded in 1981; the second pre-prize of $10k in 1988—and the grand prize of $100k was awarded in 1997 with some fanfare to the IBM Deep Blue team.

Ed also put up a prize for “math AI”, or, more specifically, automated theorem proving. It was administered through the American Math Society and a few “milestone prizes” were given out. But the grand Leibniz Prize “for the proof of a ‘substantial’ theorem in which the computer played a major role” was never claimed, the assets of the Fredkin Foundation withered, and the prize was withdrawn. (I wonder if some of the things done in the 1980s and 1990s by users of Mathematica should have qualified—but Ed and I never made this connection, and it’s too late now.)

Ed the Consultant

Particularly during his time at MIT, Ed did a fair amount of strategy consulting for tech companies—and Ed would tell me many stories about this, particularly related to IBM and DEC (which were in the 1980s the world’s two largest computer companies).

One story (whose accuracy I’ve never been able to determine) related to DEC’s ultimately disastrous decision not to enter the personal computer business. As Ed tells it, a team at DEC did a focus group about PCs—with Ken Olsen (CEO of DEC) watching. There was a young teacher in the group who was particularly enthusiastic. And Olsen seemed to be getting convinced that, yes, PCs were a good idea. As the focus group was concluding, the teacher listed off all sorts of ways PCs could change the world. But then, fatefully, he added right at the end: “And I don’t just mean here on Earth”. Ed claims this was the moment when Olsen decided to kill the PC project at DEC.

Ed told a story from the early 1970s about a giant IBM project called FS (for “Future Systems”):

IBM has this project. They’re going to completely revolutionize everything. The project is to design everything from the smallest computer to the new largest. They’re all to be multiprocessors. The specs were just fantastic. They promised to guarantee their customers 100% uptime. Their plans were, for instance, when you have a new OS, it’s updated. They guarantee 24-hour operation at all times. They plan to be able to update the OS without stopping this process. Things like that, a lot of goals that are very lofty, and so on.

Someone at IBM whom I knew very well, a very senior guy, came to me one day and said, “Look, these guys are in trouble, and maybe MIT could help them.” I organized something. Just under 30 professors of computer science came down to IBM. We got there on Sunday night and starting Monday morning, we got one lecture an hour, eight on Monday, Tuesday, Wednesday, Thursday, and four on Friday, describing the system. It was just spectacular, everything they were trying to do, but it was full of all kinds of idiocy. They were designing things that they’d never used. This whole thing was to be oriented about people looking at displays.

No one at IBM had done anything like that. They think, “Okay, you should have a computer display,” and they came up with certain problems that hadn’t occurred to the rest of us. If you’re looking at the display, how can you tell the difference between what you had put into the computer and what the computer had put in? This worried them. They came up with a hardware fix. When you typed, it always went on the right half of the screen; when the computer did something, it always went on the left half, or I may have it backwards, but that was the hardware.

…

What happened is I came to realize that they were so over their head in their goal that they were going to annihilate themselves with this thing. It was just going to be the world’s greatest fiasco for it. I started cornering people and saying, “Look, do you realize that you’re never going to make this work?” and so on, so forth. This came to the attention of people at IBM, and it annoyed them. I got a call from someone saying, “Look, you’re driving us nuts. We want to hear you out, so we’re going to conduct a debate.” There’s a guy named Bob [Evans], who was the head of the project. What happened was we’re in the boardroom with IBM, lots of officials there, and he and I have a debate.

I’m debating that they have to kill the project and do something else. He’s debating that they shouldn’t kill the project. I made all my points. He made all his points. Then a guy named Mannie Piore, who was the one who thought of the idea of having a research laboratory, a very senior guy said to me, he said, “Hey, Ed,” he said, “We’ve heard you out.” He says, “This is our company. We can do this product even if you think we shouldn’t.” I said, “Yes, I admit that’s true.” He said, “You presented your case. We’ve heard you out, and we want to do it.” I said, “Okay.” He said, “Can you do us a favor?” I said, “What’s that?’ He said, “Can you stop going around talking to people about why it has to be killed?” I said, “Look, I’ve said my piece. I’ve been heard out.” “Yes. Okay.” “I quit.”

I had only one ally in that room; that was John Cocke. As we were walking out of the room, he came over to me and said, “Don’t worry, Ed.” He said, “It’s going to fall over of its own weight.” I’ll never forget that. Ten days later, it was canceled. A lot of people were very mad at me.

I’m not sure what Ed was like as an operational manager of businesses. But he certainly had no shortage of opinions about how businesses should be run, or at least what their strategies should be. He was always keen on “do-the-big-thing” ideas. I remember him telling me multiple times about a company that did airplane navigation. It had put a certain number of radio navigation beacons into its software. Ed told me he’d asked about others, and the company had said “Well, we only put in the beacons lots of people care about”. Ed said “Just put all of them in”. They didn’t. And eventually they were overtaken by a company that did.

Ed the Businessman

Ed’s great business success—and windfall—was III. But Ed was also involved with a couple dozen other companies—almost all of which failed. There’s a certain charm in the diversity of Ed’s companies. There was Three Rivers Computer Corporation, that made the PERQ computer. There was Triadex, that made the Muse. There was a Boston television station. There was an air taxi service. There was Fredkin Enterprises, importing PCs into the Soviet Union. There was Drake’s Anchorage, the resort on his island. There was Gensym, a maker of AI-oriented process control systems, which was a rare success. And then there was Reliable Water.

Ed’s island—like many tropical islands—had trouble getting fresh water. So Ed decided to invent a solution, coming up with a new, more energy-optimized way to do reverse osmosis—with a dash of AI control. Reliable Water announced its product in May 1987, desalinating water taken from Boston Harbor and serving it to journalists to drink. (Ed told me he was a little surprised how willingly they did so.)

Looking at my archives I see I was sufficiently charmed by the picture of Ed posing with his elaborate “intelligent” glass tubing that I kept the article from New Scientist:

As Ed told it to me, Reliable Water was just about to sell a major system to an Arab country when his well-pedigreed CEO somehow cheated him, and the deal fell through.

But what about the television station? How did Ed get involved with that? Apparently in 1969 Jerry Wiesner, then president of MIT, encouraged Ed to support a group of Black investors (led by a certain Bertram Lee) who were challenging the broadcasting license of Boston’s channel 7. Years went by, other suitors showed up, and litigation about the license went all the way to the Supreme Court (which described the previous licensee as having shown an “egregious lack of candor” with the FCC). For a while it seemed like channel 7 might just “go dark”. But in early January 1982 (just a couple of weeks before I first met him) Ed took over as president of New England Television Corporation (NETV)—and in May 1982 NETV took over channel 7, leaving Ed with a foot of acquisition documents in his home library, and a television channel to run:

There’d been hopes of injecting new ideas, and adding innovative educational and other content. But things didn’t go well and it wasn’t long before Ed stepped down from his role.

A major influence on Ed’s business activities came out of something that happened in his personal life. In 1977 Ed had been married for 20 years and had three almost-grown children. But then he met Joyce. On a flight back from the Caribbean he sat next to a certain Joyce Wheatley who came from a prominent family in the British Virgin Islands and had just graduated with a BS in economics and finance from Bentley College (now Bentley University) in Waltham, MA. As both Ed and Joyce tell it, Ed immediately gave advice like that the best way to overcome a fear of flying was to learn to fly (which much later, Joyce in fact did).

Joyce was starting work at a bank in Boston, but matters with Ed intervened, and in 1980 the two of them were married in the Virgin Islands, with Feynman serving as Ed’s best man (and at the last minute lending Ed a tie for the occasion). In 1981, Ed and Joyce had a son, who they named Richard after Richard Feynman (though now themed as “Rick”)—of whom Ed was very proud.

When Ed died, Joyce and he had been married for 43 years—and Joyce had been Ed’s key business partner all that time. They made many investments together. Sometimes it’d start with a friend or vendor. Sometimes Ed (or Joyce) would meet students or others—who’d be invited over to the house some evening, and leave with a check. Sometimes the investments would be fairly hands-off. Sometimes Ed would get deeply involved, even at times playing CEO (as he did with Three Rivers and NETV).

When the web started to take off, Ed and Joyce created a company called Capital Technologies which did angel investing—and ended up investing in many companies with names like Sourcecraft, SqueePlay, EchoMail, Individual Inc. and Radnet. And—like so many startups of this kind—most failed.

Ed also continued to have all sorts of ideas of his own, some of which turned into patents. And—like so much to do with Ed—they were eclectic. In 1995 (with a couple of other people) there was one based on using evanescent waves (essentially photon tunneling) to more accurately find the distance between the read/write head and the disk in a disk drive or CD-ROM drive. Then in 1999 there was the “Automatic Refueling Station”—using machine vision plus a car database to automate pumping gas into cars:

That was followed in 2003 by a patent about securely controlling telephone switching from web clients. In 2006, there was a patent application named simply “Contract System” about an “algorithmic contract system” in which the requirements of buyers and sellers of basically anything would be matched up in a kind of tiling-oriented geometrical way:

In 2011 there was “Traffic Negotiation System”, in which cars would have rather-airplane-like displays installed that would get them in effect to “drive in formation” to avoid traffic jams:

Ed’s last patent was filed in 2015, and was essentially for a scheme to cache large chunks of the web locally on a user’s computer—a kind of local CDN.

But all these patents represented only a small part of Ed’s “idea output”. And for example Ed told me many other tech ideas he had—a few of which I’ll mention later.

And Ed’s business activities weren’t limited to tech. He did his share of real-estate transactions too. And then there was his island. For years Joyce and Ed continued to operate Drake’s Anchorage, and tried to improve the infrastructure of the island—with Ed, as Joyce tells it, more often to be found helping to fix the generator on the island than partaking of its beaches.

Back in 1978 Ed had acquired a “neighbor” when Richard Branson bought Necker Island, which was a couple of miles further out towards the Atlantic than Moskito Island. Ed told me quite a few stories about Branson, and for years had told me that Branson wanted to buy his island. Ed hadn’t been interested in selling, but eventually agreed to give Branson right of first refusal. Then in 2007 a Czech (or were they a Russian?) showed up and offered to buy the island for cash “to be delivered in a suitcase”. It was all rather sketchy, but Ed and Joyce decided it was finally time to sell, and let Branson exercise his right of first refusal, and buy the island for about $10M.

Ed and His Toys

Ed liked to buy things. Computers. Cars. Planes. Boats. Oh, and extra houses too (Vermont, Martha’s Vineyard, Portola Valley, …)—as well as his island. Ed would typically make decisions quickly. A house he drove by. New tech when it first came out. He was always proud of being an early adopter, and he’d often talk almost conspiratorially about the “secret” features he’d figured out in new tech he’d bought.

But I think Ed’s all-time favorite “toys” were planes—and over the course of his life he owned a long sequence of them. Ed was a serious (and, by all reports, exceptionally good) pilot—with an airplane transport pilot license (plus seaplane and glider licenses). And I always suspected that his cut-and-dried approach to many things reflected his experience in making decisions as a pilot.

Ed at different times had a variety of kinds of planes, usually registered with the vanity tail number N1EF. There were twin-propellor planes. There were high-performance single-propellor planes. There was the seaplane that I’d “met” in the Caribbean. At one time there was a jet—and in typical fashion Ed got himself certified to fly the jet singlehandedly, without a copilot. Ed had all sorts of stories about flying. About running into Tom Watson (CEO of IBM) who was also a pilot. About getting a new type of plane where he thought he was getting #5 off the production line, but it was actually #1—and one day its engine basically melted down, but Ed was still able to land it.

Ed also had gliders, and competed in gliding competitions. Several times he told me a story—as a kind of allegory—about another pilot in a gliding competition. Gliders are usually transported with their wings removed, with the wings attached in order to fly. Apparently there was an extra locking pin used, which the other pilot decided to remove to save weight, because it didn’t seem necessary. But when the glider was flying in the competition its wings fell off. (The pilot had a parachute, but landed embarrassed.) The very pilot-oriented moral as far as Ed was concerned: just because you don’t understand why something is there, don’t assume it’s not necessary.

Ed and the Soviet Union

One of the topics about which Ed often told “you-can’t-make-this-stuff-up” stories was the Soviet Union. Ed’s friend John McCarthy had parents who were active communists, had learned Russian, and regularly took trips to the Soviet Union. And as Ed tells it McCarthy came to Ed one day and said (perhaps as a result of having gotten involved with a Russian woman) “I’m moving to the Soviet Union”, and talked about how he was planning to dramatically renounce his US citizenship. McCarthy began to make arrangements. Ed tried to talk him out of it. And then it was 1968 and the Soviets send their tanks into Czechoslovakia—and McCarthy is incensed, and according to Ed, sends a telegram to a very senior person in the Soviet Union saying “If you invade Czechoslovakia then I’m not coming”. Needless to say, the Soviets ignored him. Ed told me he’d said at the time: “If the Russians were really smart and really understood things, and they had to choose between John McCarthy and Czechoslovakia, they should have chosen John McCarthy.” (McCarthy would later “flip” and become a staunch conservative.)

Perhaps through McCarthy, Ed started visiting the Soviet Union. He didn’t like the tourist arrangements (required to be through the government’s Intourist organization)—and decided to try to do something about it, sending a survey to Americans who’d visited the Soviet Union:

A year later, Ed was back in the Soviet Union, attending a somewhat all-star conference (along with McCarthy) on AI—with a rather modern-sounding collection of topics:

Here’s a photograph of a bearded Ed in action there—with a very Soviet simultaneous translation booth behind him:

Ed used to tell a story about Soviet computers that probably came from that visit. The Soviet Union had made a copy of an IBM mainframe computer—labeling it as a “RYAD” computer. There was a big demo—and the computer didn’t work. The generals in charge asked “Well, did you copy everything?” As it turned out, there was active circuitry in the “IBM” logo—and that needed to be copied too. Or at least that’s what Ed told me.

But Ed’s most significant interaction with the Soviet Union came in the early 1980s. The US had in place its CoCom list that embargoed export of things like personal computers to the Soviet Union. Meanwhile, within the Soviet Union, photocopiers were strictly controlled—to prevent non-state-sanctioned flow of information. But as Ed tells it, he hatched a plan and sold it to the Reagan administration, telling them: “You’re on the wrong track. If we can get personal computers into the Soviet Union, it breaks their lock on the flow of information.” But the problem was he had to convince the Soviets they wanted personal computers.

In 1984 Ed was in Moscow—supposedly tagging along to a physics conference with an MIT physicist named Roman Jackiw. He “dropped in” at the Computation Center of the Academy of Sciences (which, secretly, was a supplier to the KGB of things like speech recognition tech). And there he was told to talk to a certain Evgeny Velikhov, a nuclear physicist who’d just been elected vice president of the Academy of Sciences. Velikhov arranged for Ed to give a talk at the Kremlin to pitch the importance of computers, which apparently he successfully did, after convincing the audience that his motivation was to make the world a safer place by balancing the technical capabilities of East and West.

And as if to back up this point, while he was in the Soviet Union, Ed wrote a 5-page piece from “A Concerned Citizen, Planet Earth” addressed “To whom it may concern” in Moscow and Washington—ending with the suggestion that its plan might be discussed at an upcoming meeting between Andrei Gromyko and Ronald Reagan at the UN:

The piece mentions another issue: the fate of prominent, but by then dissident, Soviet physicist Andrei Sakharov, who was in internal exile and reportedly on hunger strike. Ed hatched a kind of PCs-for-Sakharov plan in which the Soviets would get PCs if they freed Sakharov.

Meanwhile, in true arms-dealer-like fashion, he’d established Fredkin Enterprises, S.A. which planned to export PCs to the Soviet Union. He had his student Norm Margolus spend a summer analyzing the CoCom regulations to see what characteristics PCs needed to have to avoid embargo.

In the Reagan Presidential Library there’s now a fairly extensive file entitled “Fredkin Computer Exports to USSR”—which for example contains a memo reporting a call made on August 25, 1984, by then-vice-president George H. W. Bush to Sakharov’s stepdaughter, who was by that time living in Massachusetts (and, yes, Ed was described as a “PhD in computer science” with a “flourishing computer business”):

Soon the White House is communicating with the US embassy in Moscow to get a message to Ed:

And things are quickly starting to sound as if they were from a Cold War spy drama (there’s no evidence Ed was ever officially involved with the US intelligence services, though):

I don’t think Ed ever ended up talking to Sakharov, but on November 6, 1984, Fredkin Enterprises was sent a letter by Velikhov ordering 100 PCs for the Academy of Sciences, and saying they hoped to order 10,000 more. But the US was not as speedy, and in 1985 there was still back and forth about CoCom issues. Ed of course had a plan:

And indeed in the end Ed did succeed in shipping at least some computers to the Soviet Union, adding a hack to support Cyrillic characters. Ed often took his family with him to Moscow, and he told me that his son Rick created quite a stir when at age 6 he was seen there playing a game on a computer. Up to then, computers had always been viewed as expensive tools for adults. But after Rick’s example there were suddenly all sorts of academicians’ kids using computers.

(In the small world that it is, one person Ed got to know in the Academy of Sciences was a certain Arkady Borkovsky—who in 1989 would leave Russia to come work at our company, and who would later co-found Yandex.)

By the way, to fill in a little color of the time, I might relate a story of my own. In 1987 I went to a (rather Soviet) conference in Moscow on “Logic, Methodology and Philosophy of Science.” Like everyone, I was assigned a “guide”. Mine continually tried to pump me for information about the American computer industry. Eventually I just said: “So what do you actually want to know?” He said: “We’ve cloned the Intel 8086 microprocessor, and we want to know if it’s worth cloning the Motorola 68000. Motorola has put a layer of epoxy that makes it hard to reverse engineer.” He assumed that the epoxy was at the request of the US government, to defeat Soviet efforts—and he didn’t believe me when I said I thought it was much more likely there to defeat Intel.

Ed told me another story about his interactions with Soviet computer efforts after Gorbachev came to power:

Before the days of integrated circuits the way IBM and Digital built computers was they put the whole computer together, and then it would sit for six weeks in “system integration” while they made the pieces work together and slowly got the bugs out.

The Russians built computers differently because that seemed logical to them. They’d send all the components down there and then some guy was supposed to plug them together, and they were supposed to work. But they didn’t. With these big computers, they never made any of them work.

The Academy of Sciences had one. And one time I went to see their big computer, so they unlock the doors to this dusty room where the computer is, where it’s not being used because it doesn’t work, and all this information is being kept secret, not from the United States, but from the leadership. When I discovered all this I documented it … and I wrote a 40-page document that explained it.

I was making trips with Rick often and Mike [his older son] very often. On one trip when I arrived, they tell me, “Oh, you have to come to this meeting.”

I don’t speak Russian. I never knew it. I’m seated at this meeting, and there’s a Russian friend of mine [head of the Soviet Space Research Institute] next to me. We’re just sitting there, and things are going on. I still don’t know what that meeting was, but I had this 40-page document. I gave it to my friend. He starts reading. He says, “Oh, this is so interesting.” It got to be about ten o’clock at night and they said, “Everyone come back in the morning. Nine o’clock.”

My friend said, “Can I borrow this [document]? I’ll bring it back in the morning”. I said, “Sure, go ahead.” He comes back next morning. He says to me, “I have good news, and I have bad news.” I said, “What’s the good news?” He says, “Your document has been translated into Russian.” I said, “You left here with a 40-page typewritten document. I don’t believe you.” He said, “Well, my institute recently took on the task of translating scientific American into Russian.

“When I left here, I went to my institute, called in the translators, and they all came in. We divided the document up between them, and it’s all been translated into Russian.”

The document was the analysis of the RYAD situation with the recommendation that the only thing they could do was to cancel it all.

I said, “Okay, what’s the bad news?” He says, “The bad news is it’s classified secret.” When you made a copy or did something, you had to have a government person look at it. They classified it. I said to him, “You can’t classify my documents.” He said, “Of course not. We haven’t. It’s just the Russian one that’s secret.”

Then maybe a week later, he said, “Gorbachev’s read your document.” He canceled it. RYAD. Some people I know were looking to kill me.

In Moscow, there’s a building that’s so unusual. It’s on a highway leading into the city. It’s about five stories high. It’s about a kilometer long, okay? It’s a giant building. I was in it a few years ago, and it’s just a beehive of startups, almost all software startups. That was the RYAD Software Center, okay? 100,000 people got put out of work.

Ed Becomes a Physics Professor

When I first met Ed in 1982, he was in principle a professor at MIT. But he was also CEOing a computer company (Three Rivers), and, though I didn’t know it at the time, had just become president of a television channel. Not to mention a host of other assorted business activities. MIT had a policy that professors could do other things “one day a week”. But Ed was doing other things a lot more than that. Ed used to say he was “tricked” out of his tenured professorship. Because in 1986 he was convinced that with all the other things he had going on, he should become an adjunct professor. But apparently he didn’t realize that tenure doesn’t apply to adjunct professors. And, as Ed told it, the people in the department considered him something of a kook, and without tenure forcing them to keep him, were keen to eject him.

Minsky’s neighbor in Brookline, MA, was a certain Larry Sulak—the very energetic chairman of the physics department at Boston University (and someone I have known since the 1970s). Ed knew Sulak and when Ed was ejected from MIT, Sulak seized the opportunity to bring Ed in as a physics professor at Boston University. Sulak asked me to write a letter about Ed (and, yes, particularly after the research for this piece, there are some things I would change today):

Subject: Re: Ed Fredkin

Date: Aug 24, 1988

From: Stephen Wolfram

To: Larry Sulak

Dear Larry:

In this century, people like Ed Fredkin have been very rare. Ed Fredkin
is a gentleman scientist. He has made several fortunes in business, yet
he chooses to spend much of his time thinking about science.

The main thing he thinks about is what ideas from computing can tell us
about physics. This is an area that I believe has fundamental importance
for physics. There are many issues about the behaviour of complex
physical systems where the best hope for analysis and understanding comes
from computational ideas. There are also many traditional problems
in quantum physics and other fundamental areas that I suspect are most
likely to be solved by thinking about things from a computational point
of view.

Ed Fredkin has had some very good ideas about physics and its relation
to computation. Probably the single most important was his independent discovery
of the possibility of thermodynamically reversible computation.
von Neumann got this wrong — by thinking about things from a computational
point of view, Fredkin got it right.

Fredkin has been convinced for many years that cellular automata —
basically computational models — could describe fundamental physical
processes. As you know, I have worked on using cellular automata to
model various specific physical processes. Fredkin is trying to do something
grander — he wants to show that all of physics can be reproduced by
a cellular automaton. If he is right the discovery would be one whose
importance could be compared to the discovery of quantization.
Of course, what he is trying to show may not be true, but that is a risk
that any new fundamental idea in physics faces.

Ed Fredkin’s style is not typical of scientists. He is more used to
addressing boards of directors than lecture audiences. He learned
the kind of physics that is in the Feynman lectures by spending time
with Dick Feynman rather than reading his books. To some standard
scientists, Fredkin at first seems like a nut. To be sure, some of his
ideas are pretty nutty. But if you listen and think about it, there
is much substance to what Fredkin has to say.

I gather that Fredkin has decided to spend some time around “ordinary
physicists”, to try and work out how his ideas fit in with current
physical thinking. I believe you are very lucky that Fredkin wants
to do this in your department.

Best wishes,
Stephen

And so it was that Ed became a research professor of physics at Boston University (BU). At MIT he’d gotten a DARPA grant that supported Tom Toffoli and Ed’s only “physics PhD student” Norm Margolus in building ever-larger “cellular automaton machines”. And when Ed moved to BU, this effort moved with him, leaving in effect “no trace of Ed” at MIT.

When Ed arrived at BU he found he was assigned to an office with a certain Gerard ‘t Hooft—who happens to be one of the more creative and productive theoretical physicists of the past half-century (and would win a Nobel Prize in 1999 for his efforts). Ed became friends with ‘t Hooft, inviting him and his family to spend time on his island, and later on the boat that Ed bought in the south of France. Feynman died in 1988, and Ed would tell me that he thought he’d “traded” one great physicist for another. (Feynman had suggested Ed try Sidney Coleman, but Coleman wasn’t into it.)

Like Feynman, I think ‘t Hooft felt a little uneasy with Ed’s statements about physics. But in 2016 ‘t Hooft ended up publishing a book entitled The Cellular Automaton Interpretation of Quantum Mechanics. I thought it was a nice recognition of ‘t Hooft’s friendship with Ed. But Ed told me in no uncertain terms that he thought ‘t Hooft hadn’t given him the credit he was due—though in reality I don’t think what ‘t Hooft did was much related to Ed’s actual work and ideas. (And, by the way, it’s not directly related to my efforts either, though conceivably looking at “generational states” in our Physics Project may give something at least somewhat analogous.)

In 1994 Ed’s direct affiliation with BU ended—though he remained on good terms with the department, and after I moved to the Boston area in 2002 I would often see him at an annual dinner the BU physics department put on for “Boston-area physics people”.

In 1998 Ed would summarize himself like this:

Ed Fredkin has worked with a number of companies in the computer field and has held academic positions at a number of universities. He is a computer programmer, a pilot, advisor to businesses and an amatuer [sic] physicist. His main interests concern digital computer like models of basic processes in physics.

For a while, Ed didn’t have a “university affiliation” (except, through Minsky, as a visitor at the MIT Media Lab), but in 2003—through his friend Raj Reddy—he became a professor (now of computer science) at Carnegie Mellon University, for a while spending time at their West Coast outpost, but mostly just making occasional trips in his plane to Pittsburgh.

Forty Years of Interactions with Ed

For a few years after I first met Ed in 1982, I’d see him fairly regularly. In 1983 I invited him to the first “modern” conference on cellular automata, that I co-organized at Los Alamos. I visited his house in Brookline, MA, a few times. I saw him at the Aspen Center for Physics, and at other places around the world. He was always fun and lively—and told great stories about all sorts of things. He gave the impression that he was mostly spending his time doing big things in business, and that science was an avocation for him. Sometimes he would talk about cellular automata—though I now realize that what he said was either very general and philosophical (leaving me to interpret things in my own way), or very specific to particular rules he’d engineered.

It was always a bit uncomfortable when it came to physics. Because the things Ed was saying always seemed to me pretty naive. Quite often I would challenge them—and frustratedly tell Ed that he should learn twentieth-century physics. But Ed would glide over it—and be off telling some other (engaging) story, or some such.

In 1986 I co-organized (with Tom Toffoli and Charles Bennett) a conference called Cellular Automata ’86—at MIT. Ed didn’t come—and I think I had the impression that he’d rather lost interest in cellular automata by that time. I myself went off to start my Center for Complex Systems Research, and then to found Wolfram Research and start the development of Mathematica. Mathematica was released on June 23, 1988—and our records (yes, we’ve kept them!) show that Ed registered his first copy on December 14, 1988. In March 1991 I did a lecture tour about Mathematica 2.0, and saw Ed one last time before diving into work on my book A New Kind of Science—which led me for more than a decade to became an almost complete scientific hermit.

I saw Ed (now 62 years old) when I briefly “came up for air” in connection with the release of Mathematica 3.0 in 1996, and we continued occasionally to exchange pleasant emails:

Date: Sun, 29 Jun 1997 15:49:41 -0400

From: Ed Fredkin

To: Stephen Wolfram

…

[Reporting the birth of my second child]

…

For many children its worst when they are teenagers. Some glide through
that period of life without hassle. Rick is doing great (at 15) despite
his unorthodox education. He relishes calling his parents dopes, but
aside from arguments about subjects like how late he should be able to
hang out with his buddies, its clear that he doesn’t think we’re dopes.

…

I promise to read your book as soon as I get it!

…

Its nice to hear from you. News here is that I am no longer needed at
Radnet as they now have a great CEO. I got a new airplane in December.
It’s called a Cessna CitationJet. It can carry 7 people at about 440
mph. So far its been a lot of fun. We’ll have to think of an excuse to
go for a ride. We are planning to spend some time at Drake’s Anchorage
in July. Its great for kids so if that interests you, let me or Joyce
know.

I have taken as a challenge to architect a computer (that weighs a few
kilos) that assumes another 100 years of Moore’s Law (10^15 in cost
performance). There are a lot of unsuspected problems lurking in the
details, but everyone of them seems to have easy solutions. I have
given a number of talks (IBM Almaden and Watson labs, Intel, NYU,
etc.). Interest in reversible computing has picked up since heat
dissipation has gotten to be a really hot topic (no pun intended). The
next high end Alpha may dissipate as much as 150 watts. Think of a
light bulb!

I use Mathematica for something almost every week… keep it up!

Best regards,

Although I didn’t see Ed myself for quite a few years, Ed would always write to ask for betas of new versions of Mathematica, and he would sometimes chat with staff from my company at trade shows. I thought it a bit odd in 1999 when I heard that in such an encounter he said that he was the one who had “introduced me to cellular automata”. And, moreover, that he, Feynman and Murray [Gell-Mann] were the people who’d suggested I write SMP—which was particularly bizarre since, among other things, I hadn’t met (or even heard of) Ed until about 3 years later.

Then, out of the blue on September 13, 2000, Ed calls my assistant, and follows up with an email:

Subject: Invitation

Date: Wed, 13 Sep 2000 23:53:09 -0400

From: Ed Fredkin

To: Stephen Wolfram

Hi,

The primary reason I’m contacting you has to do with a program I’m
organizing at Carnegie Mellon (CMU). I wrote a proposal to the NSF, called
“The Digital Perspective” and got funded. The idea is to invite a number (8
to 10) of guests to come to CMU for a few days, to meet with students and to
give a Distinguished Lecture. The NSF would also like to arrange for the
guests to come to Washington D.C. and give the same lecture there.

By “Digital Perspective” I mean looking at aspects of the world as Digital
Processes. As you know, I am most interested in looking at physics this
way. I have just started getting commitments from potential participants.
Gerard ‘t Hooft has agreed to come and a number of other good physicists are
thinking about it.

…

Please consider this to be a formal invitation. Of course, CMU will pay
expenses and an honorarium. If the timing works out, it can probably be
arranged for many of the students to have read your book before you come.
You might get some good feedback from bright students who have also gained
familiarity with the thoughts of others who are thinking about the “Digital
Perspective”. The seminar will run throughout the 2000-2001 academic year.

If you can make it to CMU, I expect that it will be fun and interesting;
both for you, for me and for many others.

…

I responded:

Subject: RE: Invitation

Date: Thu, 14 Sep 2000 06:49:19 -0500

From: Stephen Wolfram

To: Ed Fredkin

Thanks for the invitation, etc.

It sounds like a thing I’d like to do, but I can only consider
*anything* after my book is finished.

…

If my book is done in time for your program, then, yes, I’d like to
participate (though of course I’d want more details about the actual
plans etc. etc.). But if the book isn’t done, then sadly I just can’t.
If the cutoff time is June 2001, I am not extremely hopeful that the
book will be done … but if it’s fall 2001 the probabilities go up
substantially (though, sadly, they are still not 100%).

…

And what are you up to these days? Business? Science? Other?

On another topic:
In my book, I’m trying very hard to write accurate history notes about
the things I discuss. And for the notes on the history of cellular
automata I’ve been meaning for ages to ask you some questions…

I’m not sure this is a complete list, but here are a few I’ve been
curious about for a long time that I’d really like to know the answers
to…

I know that history is hard … even if it’s about oneself. I consider
that I have a good memory, but it’s often hard for me to keep straight
what happened when, and why, etc. But anything you can tell me about
these questions … or about other aspects of CA history … I’d be very
grateful for.

1. As far as you know, did you invent the 2D XOR CA rule? (I’m assuming
the answer is “yes”…)

2. In what year did you first simulate this CA? On what computer?
Where?

3. What other CA rules did you study at that time?

4. Do you still have any material from the simulations you did
(printouts, tapes, programs, etc.)?

5. When you learn about the “munching squares” display hack? How did it
relate to your work on the XOR CA?

6. What did you know about the work done by Unger etc. on cellular image
processors? How did this relate to your work?

7. What did you know about von Neumann’s work on cellular automata? How
did it relate to your work?

8. What did you know about Ulam and others’ work at Los Alamos on
simulating cellular automata? How did it relate to your work?

9. Were you aware of work on cryptographic applications of CA-like
systems?

…

Ed responded:

Subject: RE: Invitation

Date: Fri, 15 Sep 2000 01:25:23 -0400

From: Ed Fredkin

To: Stephen Wolfram

Hi,

Here are some answers and some free association type ramblings.

…

> And what are you up to these days? Business? Science? Other?

I’m winding down on business (I’m into one last e-business project) and like
you, working on a book. My guess is that mine is nowhere as ambitious as
yours… It’s just to document my ideas about Digital Mechanics (Physics).
In any case, these ideas have made more progress in the last 2 years than in
the previous 40.

I bought a sailboat which is moored in Antibes, France. I spent most of the
summer there and got more science done than in the prior several years.
It’s absolutely the perfect place and circumstance for me to work on my
stuff. Gerard ‘t Hooft (plus wife and daughter) came down and joined us for
a while. You know (I hope) about his interest in CA’s? I’m going back
there for a few weeks on Tuesday.

Here’s a formal proof that you can, at any time, escape all your normal
responsibilities and concentrate exclusively on one really important thing
(hint, hint). The proof is that, at any time, YOU CAN DIE. I don’t mean to
be morbid, but sometimes it makes good sense to consider that proof and
temporarily abandon all but some very important task (or some very exciting
or fun thing).

Ed continued with a long response to my “history questionnaire”:

> 1. As far as you know, did you invent the 2D XOR CA rule? (I’m assuming
> the answer is “yes”…)

Yes, as far as I know I did invent it. Here is what I did. I decided to
look for the simplest possible rule that met certain criteria. I wanted
spatial symmetry and a symmetric rule vis-à-vis the states of the cells.
The thought was to find something so simple that its behavior could be
understood while not so simple as to be totally dull. The first such rule I
tried was the XOR rule. I programmed it first on the PDP-1 (1961, at BBN
and III) where I could see it on the display, and later I wrote a program
for CTSS using a model 33 teletype as a terminal. My motivation was then,
as it is now, to be able to capture more and more properties of physics
within a Digital model. I found an easy proof as to why patterns reappeared
in any number of dimensions. I also found, at the beginning, a formula for
the number of ones as a function of time from a single one as the initial
state. My recollection was that it was something like 2D 2^b(t) where D is
the number of dimensions, t is the time step, and b(t) is the number of bits
that are one in the binary representation of t (the tally function). After
I showed all this to Seymour Papert, he generalized the proof re self
replication from XOR (sum mod 2) to sum mod any prime. (Some time around
1967)

> 2. In what year did you first simulate this CA? On what computer?

Where?

See above.

> 3. What other CA rules did you study at that time?

I found a simple proof that a von Neumann neighborhood CA could exactly
emulate any other (such as the 3×3 neighborhood) and used this as a reason
to look at nothing else. I explored so many different rules that I probably
would have found the game of Life had I not put blinders on. After I came
to MIT (1968), I had 2 things in mind, to find a really simple Universal CA
(I call them UCA’s )and to find Reversible, Universal CA’s (RUCA’s)
As you may know, the search for UCA’s went slowly until I had the idea to
abandon the Turing Machine model and look at modeling digital logic and
wires. Within 15 minutes after this idea occurred to me, I had a 4 state
UCA on my blackboard. At that time the best known was in Codd’s thesis; an
8 state UCA. I showed this to a student of mine, Roger Banks, who had been
struggling for a few years trying to complete an AI PhD thesis. The next
morning both he and I showed up with 3 state UCA’s. He switched his PhD topic
and found a 2 state, von Neumann neighborhood UCA, a thing that Codd
purported to have proved impossible.

While at BBN, after seeing all my 2-D CA’s expanding with simple
kaleidoscope like symmetries, (like the diamond shapes in the XOR rule),
Marvin Minsky challenged me to find a rule (any rule) that showed spherical
propagation. I took the challenge and shortly came up with such a rule.

With respect to reversibility, the first satisfactory RUCA was done by
Norman Margolus. I shortly thereafter found a simple RUCA that didn’t need
the use of the Margolus Neighborhood trick.

> 4. Do you still have any material from the simulations you did
> (printouts, tapes, programs, etc.)?

Yes, Probably, quite a bit

> 5. When you learn about the “munching squares” display hack? How did it
> relate to your work on the XOR CA?

I don’t recall it having any effect. It’s very unlikely that I knew of it
prior to the XOR CA.

> 6. What did you know about the work done by Unger etc. on cellular image
> processors? How did this relate to your work?

I knew of it second hand, but I don’t think it had any effect. Do you know
about Farley and Clark (Wes Clark) and their publication while at MIT’s
Lincoln Labs in the late 50’s?

> 7. What did you know about von Neumann’s work on cellular automata? How
> did it relate to your work?

At the time I did the XOR work I had not read anything about the von Neumann
CA, but I was told about it and I understood the concept very well. Many
years later I read something (by Burkes, I think). I remember knowing that
it was a 29 state system and that it knew left from right in order to extend
and turn its construction arm.

> 8. What did you know about Ulam and others’ work at Los Alamos on
> simulating cellular automata? How did it relate to your work?

All I knew about Ulam and CA is that, like the Hydrogen Bomb, he had key
ideas but probably didn’t get as much credit as he deserved. All my
knowledge re Ulam was anecdotal. As to what he did vs. what von Neumann did
I didn’t really know anything.
I didn’t know anything about anyone else actually simulating CA’s however
I’m pretty sure I assumed that others must have done so. It was so easy and
so obvious. While the use of a computer with a display (such as the Lincoln
Lab TX-0 and TX-2, the Digital PDP-1 and the IBM 709 and 7090 all had or
could have CRT displays, it was easy enough to display simple CA’s with a
printer, even a 10 CPS teletype.

> 9. Were you aware of work on cryptographic applications of CA-like
> systems?

I thought I invented that idea! As soon as I found ways to make RUCA’s it
occurred to me that they could be used for cryptography. As an aside, when
Witt Diffey [Whit Diffie] came up with the idea of public key cryptography,
which needed a trapdoor function, I thought of using the product of 2 large primes.
I had just written the first program, in LISP, to implement Michael Rabin’s first
version of a probabilistic prime test. As soon as I implemented it I
started a search at 10^100 and discovered that 10^100 +35,737 and 10^100
+35,739 were prime. A week later I met Rich Schroeppel in LA (he was
working for my company, III) and knowing a larger prime pair than anyone
else on Earth I told Rich and he was blown away. He was seated at a PDP-10
terminal and all he said was an emphatic “Really!” He then went type, type,
type for a few seconds and turned around and said “You’re right!” which blew
me away! I asked what he did and he said (while knowing nothing of Rabin’s
method) “all I did was look at 3^(n-1) mod n, you know, Fermat’s little
theorem, it usually gives 1 for primes.”

I’m rambling, probably about stuff of no interest to you. Anyway, I stopped
Ron Rivest in the hallway at Tech Square and asked if he had heard of
Diffey’s [Diffie’s] stuff. I don’t remember exactly what he said but I know that when
I told him that Rabin’s new method to find large primes meant that the
product of 2 primes was a good trapdoor function he was surprised and
thought it was a good idea! I never thought any more about it and hadn’t
come up with the idea of using the phi function… Years later, long after
RSA was a big thing Ron reminded me of the event… Don Knuth told me that
he also thought of using the product of 2 primes before RSA, but he couldn’t
have known about Rabin’s method when I did (as Rabin told it to me right
when he thought it up!)

By the way, I have an interesting algorithm for factoring smaller numbers,
such as can be done in less than an hour with Mathematica (normal
FactorInteger or ECM). I’ve written a few terribly unoptimized Mathematica
functions that implement the method. For what its good for, my Mathematica
functions (not compiled or anything) make Mathematica factor in a lot less
real time than Mathematica does with FactorInteger or ECM.

The big news re me and my work is what’s happening right now. Whatever one
thinks about my stuff (Digital Mechanics), it’s vastly improved. However
it’s still very far from a complete theory. Of course, Digital Mechanics is
about CA’s.

If you have any interest in reversibility, I’ve done lots in that area,
ranging from RUCAs, conservative logic, and my transforms. The transforms
are general methods of converting algorithms that calculate the approximate
time evolution of a system (approximate because of round off, truncation and
the finite delta t) which is approximately reversible (by changing delta t
to minus delta t) into an equivalent algorithm that calculates approximately
the same thing going forwards, but which is exactly reversible (being
calculated on a computer with round off and truncation error). I also have
a lot of methods for making RUCA’s with particular properties.

You’ve criticized me in the past for not publishing stuff, but I’m so
ambitious as to what I’m trying to do that I haven’t had the motivation to
publish all the little things I’ve uncovered along the way.

I’m sure I discovered more and better ways to make all kinds of RUCAs before
anyone else with the exception of the rule found by my student, Margolus.

Finally, one last anecdote. You and I were at some meeting long ago (maybe
Santa Fe?) and you brought along an early Sun to demonstrate your collection
of different kinds of 1-D CA’s. After your talk, I asked you why none of
the CA’s you showed were reversible. Your response was “Because all
reversible CA’s are trivial.” That really was a very common belief,
coincident with most people’s intuition. On the spot, I made up a rule,
using your convention for specifying it, of an “interesting” reversible CA.
You typed it in and ran it. Being surprised is one of the best kinds of
experiences we ever have.

As Emerson once quipped, “My apologies for such a long email, I didn’t have
the time to write you a short one.”

I’m having fun; it’s a good thing to do!

Best regards

Ed F

PS If you have any interest in having parts of your book read so that you
can get comments prior to publication, I have an idea that might be useful.

A little later he added:

Subject: error

Date: Fri, 15 Sep 2000 10:12:15 -0400

From: Ed Fredkin

To: Stephen Wolfram

Hi,

Looking at my long email I noticed a boo boo.

Where I wrote, quoting Schroeppel talking about 3^(n-1) mod n, “…it
usually give a 1 for primes…” very true but a bit of an understatement.
Of course, it ALWAYS gives a 1 for primes! What Schroeppel said was that it
usually doesn’t give a 1 for non-primes. It’s incorrect for 91 and 121 and
lots of other small numbers, but seems to work better for large numbers…
but then you probably know much more about such things than I do. Also
looking at your questions, I had the feeling that some might have been
prompted by my circa 1990 Digital Mechanics paper. If so, I guess I
repeated stuff already in the paper and I apologise.

Regards,

Ed F

I responded, asking for various pieces of clarification (and now that I’m writing this piece I would have asked even more, because some key parts of what Ed said I now realize don’t add up):

Subject: Re: your mail

Date: Wed, 20 Sep 2000 21:04:54 -0500

From: Stephen Wolfram

To: Ed Fredkin

…

>> 1. As far as you know, did you invent the 2D XOR CA rule?
>>
> Yes, as far as I know I did invent it. Here is what I did. ….
>

Very interesting.

1a. Did you ever look at 1D CAs? If not, why not?

1b. Did you think about analogies between XOR rules and linear feedback
shift registers?

1c. Did you think about analogies between XOR rules and Pascal’s
triangle?

By the way, the result about the number of binomial coefficients mod a
prime has been independently discovered a remarkable number of times
(including by me). The earliest references I know are Edouard Lucas
(1877) and James Glaisher (1899).

….

>> 3. What other CA rules did you study at that time?

> … I explored so many different rules that I probably
would have found the game of Life had I not put blinders on.

By the way, I happened to have a long phone conversation recently with
John Conway about the history of the Game of Life. I still haven’t
quite got to the bottom of exactly what Conway was doing and why (I
think he wants some of the history lost, which is a pity, because it is
interesting and reflects much better on him than he seems to
believe…) But what is clear is that Conway (and his various helpers)
had much more serious motivations from recursive function theory etc.
than is ever usually mentioned. It was just not a “find an amusing
game” etc. piece of work.

> Marvin Minsky challenged me to find a rule (any rule) that showed spherical
propagation. I took the challenge and shortly came up with such a rule.

I don’t believe I’ve ever seen your rule of this kind. I showed such a
rule to Marvin in 1984 and he said “that’s very interesting; we were
looking for these but hadn’t found any”. So I’m confused about
this….

…

>> 6. What did you know about the work done by Unger etc. on cellular image
processors? How did this relate to your work?

> I knew of it second hand, but I don’t think it had any effect.

Wasn’t BBN quite involved with cellular image processing? And I believe
you worked on aerial photography analysis. Did you use cellular
automata for image processing?

…

>> 9. Were you aware of work on cryptographic applications of CA-like
systems?

> I thought I invented that idea!

There was a lot of work done on 1D CAs by some distinguished
mathematicians consulting for the NSA in the late 1950s. I think much
of it is still classified. But over the years I’ve talked to many of
the people involved (Gustav Hedlund, Andrew Gleason, John Milnor, some
NSA folk, etc. etc.), and read their unclassified papers. They figured
out some interesting stuff. They thought of it as related to nonlinear
feedback shift registers.

> As soon as I found ways to make RUCA’s it
occurred to me that they could be used for cryptography.

How?

There’s a 1D CA (rule 30) that I studied in 1984 that has been
extensively used as a randomness generator (e.g. Random[Integer] in
Mathematica uses it), and that has been used a bit as a cryptosystem.

I tried to make a good public key system out of CAs in the mid-1980s
(mostly in collaboration with John Milnor), but did not come up with
anything satisfactory. …

…

> I also have a lot of methods for making RUCA’s with particular properties.

I am definitely somewhat interested in these things. They don’t happen
to be central to my grand scheme. But they are obviously worthwhile …
AND WORTH (you) WRITING DOWN!!

I’m sure I discovered more and better ways to make all kinds of RUCAs before
anyone else with the exception of the rule found by my student, Margolus.

Interesting. You probably know that the general problem of telling
whether an arbitrary 2D CA is reversible is undecidable (the question
can be mapped to the tiling problem).

So I’m taking it that you have some good methods for generating 2D
reversible CAs. That’s obviously interesting.

> Finally, one last anecdote. … I asked you why none of
the CA’s you showed were reversible. Your response was “Because all
reversible CA’s are trivial.” …

This anecdote can’t be quite right. I have known since 1982 that there
are nontrivial things that can happen in CAs that are made reversible by
your mod 2 trick. What is true (and may have been what I was saying)
is that none of the 2-color nearest neighbor CAs that are reversible are
non-trivial. With more colors or more neighbors, that changes. I’m
guessing that what you showed me was a 4-color nearest neighbor CA that
is reversible … and that is of course quite easy to get by recoding a
2-color one that has your mod 2 trick.

By the way, I heard third hand a while back that you had “introduced me
to CAs”. For what it’s worth, that isn’t correct. My first “CA
experience” was actually in 1973 (when I was 13) when I tried to program
molecular dynamics on a very small computer, and ended up with something
equivalent to the square CA fluid model. My next CA experience was in
summer 1981. I was trying to make models of “self organizing” systems
(now I hate that term), particularly self-gravitating gases. I ended up
simplifying the models until I got 1D CAs. That fall I spent a month at
the Institute for Advanced Study, and spent a lot of time studying von
Neumann’s work, etc., and analysing all sorts of features of 1D CAs. I
came for a day to give a talk at MIT, and was having dinner with some
LCS people (Rich Zippel was one of them), and they told me about your
work. Later that fall I talked with Feynman a certain amount about what
I was doing with CAs, and he again mentioned you. (I think he had been
to your Physics of Computation meeting, which was perhaps in June 1981,
but I didn’t discuss the CA aspects of the meeting with him.) Then in
[January 1982] I came to the meeting you had on your island, and Tom Toffoli
showed me his 2D CA machine (at the time he gave me the impression of
95% hackery, 5% science), and you showed me the 2D XOR CA on a PERQ
computer.

Ed didn’t respond to this, but three days later we talked on the phone. I sent some (unvarnished) notes from the call to a research assistant of mine:

Subject: Fredkin conversation

Date: Sat, 23 Sep 2000 03:05:43 -0500

From: Stephen Wolfram

I had a long conversation with Ed this evening.

About his work in science, my work in science, etc.

A few things mentioned:

– He feels bitter that his paper on reversible logic, coauthored with
Tom Toffoli, was actually all his (Ed’s) work

– He is pleased that I will discuss history even when people haven’t
published things (of course he has published little)

– He says he has written about 150 pages about his views of physics; he
is planning to prepare something, perhaps for publication, in about a
year

– He says he missed not being able to bounce ideas off Dick Feynman …
even though Feynman often ended up screaming at him (Ed) about how dumb
his ideas were

– He said that his main problem was that he has been trying to get
people to steal his ideas for years, but nobody was interested

– He said that now “for some reason” he is becoming more concerned about
matters of credit

– He is a serious fan of Mathematica, the Mathematica Book, etc.

– He made an effort again (he’s been trying for 20 years) to get me to
coauthor a paper with him. He recognizes that he can’t write a credible
scientific paper, but he’s “sure he has some ideas I haven’t thought
of”. I told him that unfortunately I haven’t written a paper for 15
years.

– I told him that particularly when I’m in the Boston area, I’ll look
forward to chatting with him about physics etc.

– He said he’s tried to interact some with Gerhardt ‘t Hooft, but that
‘t Hooft keeps on rushing off in traditional physics directions that Ed
(and I, by the way) think are stupid

– He wanted to know if I really believed that all of physics etc. was
ultimately discrete; he expressed the opinion that he and I may be the
only people in the world who actually believe that right now

– He told a bizarre story about how Don Knuth gave a talk at MIT
recently on computers and religion, and how 1/4 of it was stuff that Don
had heard about from Ed. Apparently Guy Steele asked a question about
how Don’s stuff related to Ed’s, and Don said something meaningless.

I talked to him a little more about the CA history stuff. He mentioned
that around 1961 a certain Henry Stommel (sp?) told him that CA-like
models had been used in studying sand dunes in the 1930s. I have a
feeling this may be another cat gut search, but perhaps we can follow
up. (You could email Ed at the appropriate time.)

I asked Ed if he had ever looked at cryptography (as in NSA style stuff)
with CAs. He said no. But that in the late 1960’s he had had a student
who had studied ways to make counters out of JK flip flops … and that
that person’s work had made something that Ed thought could be used for
cryptography. This was followed up by a certain Vera Pless
subsequently.

I didn’t hear anything more from Ed for a while, though a public records search indicates that, yes, he had successfully “worked the system” to get $100k from the NSF for “The Digital Perspective Project”. And on May 1, 2001, I received a rather formal email from Ed (for some reason Americans born before about 1955 seem to reflexively call me “Steve”):

Subject: Workshop on the Digital Perspective 24-26 July, Washington DC

Date: Tue, 1 May 2001 21:20:45 -0400

From: Ed Fredkin

To: Steve Wolfram

We are sending this email to invite you to an NSF-sponsored workshop on the
Digital Perspective in Physics planned for July 24th through the 26th,
Tuesday, Wednesday and Thursday. It will be held in the NSF building,
Arlington Virginia. Gerard ‘t Hooft has already agreed to present a paper
and we hope that you will also be willing to contribute. We intend to
combine the papers presented at the workshop into a monograph that will be
published later this year. Two earlier workshops on related subjects were
held at Moskito Island and this was a central theme at a meeting held at
MIT’s Endicott house in 1982. Participants at previous meetings included
Charles Bennett, Richard Feynman, Ed Fredkin, Leo Kadanoff, Rolf Landauer,
Norman Margolus, Tomasso Toffoli, John Wheeler, Ken Wilson, Stephen Wolfram
and others.

…

When I didn’t immediately respond, Ed called my assistant, saying that he was “calling regarding a meeting he spoke with [me] about on the phone”. I responded by email later the same day:

Subject: I gather you called…

Date: Tue, 15 May 2001 15:18:57 -0500

From: Stephen Wolfram

To: Ed Fredkin

Sorry for not getting back to you sooner….

I myself am right now trying to work at absolutely full capacity to finish my
book/project. I haven’t done any travelling at all for a long time, and won’t
until my book is done.

And I also don’t yet have anything public to say about my work on physics.

Hopefully by the end of the year my book will be done and I will have quite a
bit to say.

However, it occurs to me that one or two of my assistants might be very good
people to come to your workshop.

Who all is coming?

One person you should definitely invite is someone who has been an assistant of
mine, and now works part time for me, and part time on his own projects. His
name is David Hillman, and he’s been interested in discrete models of spacetime
for a long time. (He got his PhD working on some kind of generalization of
cellular automata intended as a spacetime model.)

I have two physics assistants, and one math one, who might be relevant for your
workshop.

Just let me know in more detail who might be coming, and I’ll try to figure out
the correct person/people to suggest.

Of course I’d love to come myself if I were a free man. But not until the book
is done.

In haste,

— Stephen

Ed responded pleasantly enough:

Subject: RE: I gather you called…

Date: Tue, 15 May 2001 17:08:39 -0400

From: Ed Fredkin

To: Stephen Wolfram

Hi,

Sorry you can’t make it.

About half of those coming are veterans of some Moskito Island workshop.
Newcomers include Gerry Sussman, Tom Knight, Gerard ‘t Hooft, John Negele,
John Conway, Raj Reddy, Jack Wisdom, Seth Lloyd, David di Vincenzo, plus a
number of students, etc. A couple of those mentioned are still struggling
with scheduling issues.

But, in any case, I would be pleased to have David Hillman come to the
workshop. Send me his email address and I will send him an invitation.

Best regards and good luck on the book!

I responded and suggested an additional person from our team for his workshop. Nearly a month passed with no word from Ed, so I pinged him asking what was going on. No response. It was a very busy time for me, and this wasn’t something I wanted to be chasing (I saw myself as doing Ed a favor by suggesting sending people to his workshop) … so I sent a slightly exasperated email:

Subject: your conference, again

Date: Fri, 15 Jun 2001 06:08:39 -0500

From: Stephen Wolfram

To: Ed Fredkin

Look … I’m now in a bit of an embarrassing situation: following your
initial response, I told David Hillman and David Reiss about your conference
… assuming you’d want to invite them … and they both became quite
interested in it. But they never heard from anyone about it. So now of
course they’re wondering what’s going on. And so am I. What should I tell
them? I’m now embarrassed about having suggested this…

This seems peculiarly un-you-like. I was thinking you must have been away
or something. But isn’t the conference coming up very soon?

I hope everything’s OK…

Still no response from Ed. A week later I called him, and we talked for two hours. It wasn’t clear why he hadn’t already reached out to the people I’d suggested, but he quickly said he would. And then Ed launched into telling me about the “astounding” cellular automaton models he said he’d just created that “had charge, energy, momentum, angular momentum, etc.”. He talked about things like the idea of what he called an “infoton” that would be an “information particle” that would “make Feynman diagrams reversible”. I explained why that didn’t make any real sense given how Feynman diagrams actually work. It was the same kind of conversation I had many times with Ed. I kept trying to explain what was known in physics, and he kept on coming back with things that, yes, I think I understood, but that seemed close to typical crackpot fare to me. But Ed seemed convinced he had discovered something great (though exactly what I couldn’t divine). And eventually—having obviously not convinced me of what he was doing “on its merits”—he just came out and said “It must be related to stuff you’re doing, one way or another”.

I explained that I really didn’t think that was very likely, not least because I emphatically wasn’t trying to use cellular automata as models of fundamental physics. And with that, Ed launched into a long speech about giving credit, particularly to him. I explained that I was trying hard to write correct history, and reiterated some of the questions I’d asked him before. He didn’t really tell me more, but instead regaled me with stories (that I’d mostly heard many times before from him) about how he’d been the first to figure out this and that—apparently oblivious to historical research I tried to tell him. But eventually we both had to go—and the conversation ended pleasantly enough, with him confirming the email addresses for the two people for his workshop.

As the workshop approached, the people from my team had made arrangements to go to Washington, DC—but still didn’t know where exactly the workshop was. With days to go, one of them simply called Ed to ask. But Ed told them that actually they couldn’t come, because “Raj Reddy says there is no room for you”. Really? No extra chair to be found? Ed was the organizer, wasn’t he? Why was he laying this on someone else? It seemed to me that Ed was playing some kind of game. But at that moment I was too busy trying to finish my book to think about it. (Now that I’m writing this piece, however, I realize that Ed was perhaps following an “algorithm” he’d established years earlier when he was proud to have organized a meeting to push forward his ideas about timesharing—by inviting just people who supported his ideas, and not inviting ones who didn’t. I don’t know if the meeting actually happened, or what went on there. I don’t think the writeup promised in the invitation and in the NSF contract ever materialized.)

In January 2002 A New Kind of Science was off to the printer, and review copies were starting to be sent out. In late March a seasoned journalist named Steven Levy (who had written about my work on cellular automata in the mid-1980s) was talking to someone from my team and reported that Ed had told him that “Minsky had told [Ed] to publish his stuff on the web to stake out priority” before my book came out. (And it’s a pity Ed didn’t do that, because it might have made it clear to him and everyone else how different what he was saying was from what I was saying.) But in any case Levy said that Ed seemed to be saying the same things as he’d said 15 years ago—and Levy knew that regardless of anything else I’d done incredibly much more since then.

After his conversation with Levy, Ed sent me mail:

Subject: The Book

Date: Fri, 22 Mar 2002 16:38:29 -0800 (PST)

From: Ed Fredkin

To: Stephen Wolfram

Congratulations on finishing!!!

I ordered the book from someplace, so long ago I can’t
remember from who. I’m wondering if, when its
possible, I could get a copy in advance of whenever my
ordered copy is going to appear. I just don’t want to
be the last on the block to see it. Of course I’d be
happy to pay if you can tell me how to do it.

Thanks,

Ed Fredkin

The book was going to be published on May 14; on May 4 I signed a copy for Ed:

The book mentioned Ed a total of 7 times. (The person with the most mentions overall was Alan Turing, at 19; Minsky had 13; Feynman 10.)

Ed never told me he’d received the book. And I’m not sure he ever seriously looked at it. But somehow he was convinced that since he knew it talked a lot about cellular automata, and had a section about physics, it must be about his big idea—that the universe is a cellular automaton. As one witty friend pointed out to me in connection with writing this piece, my book says only one thing about the universe being a cellular automaton: that it isn’t! But in any case, Ed apparently seemed to feel that I was stealing credit from him for his big idea—and, as I now realize, started an urgent campaign to right the perceived wrong, basically by telling people that somehow (despite all my efforts to describe the history) I wasn’t giving anyone enough credit and that “he was there first”. The New York Times rather diplomatically quoted Ed as saying “For me this is a great event. Wolfram is the first significant person to believe in this stuff. I’ve been very lonely”. It followed up by saying that “Mr. Fredkin, who said he was a longtime friend, said Dr. Wolfram had ‘an egregious lack of humility’”. (In some contexts, I suppose that might be a compliment.)

In writing this piece I asked Steven Levy what Ed had actually said in the interview he did. His first summary in reviewing his notes was “He says he considers you a friend and then goes on endlessly about what an egomaniac you are”. But then he sent me his actual notes, and they’re somewhat revealing. Ed doesn’t claim he introduced me to cellular automata, perhaps because he realizes that Levy knows from the 1980s that that isn’t true. But then Ed tells the story about showing me reversible cellular automata, which I’d explained to Ed wasn’t true. Ed goes on to say that “Everyone who’s in science wants credit, driven probably by wanting to become famous. [Wolfram] has a larger than normal dose”. Ed says that when he had said that cellular automata underlie physics, I’d said that was crazy. (Yup, that’s true.) But then Ed said “Now he denies this”. Huh? Ed went on: “He’s a prisoner of some kind of overactive ego. I believe he might not know. Wolfram deserves loads & loads of credit, but he has this personality flaw”. And so on.

A month later Ed writes to me:

From: Ed Fredkin

To: Steve Wolfram

Sent: Friday, June 14, 2002 2:48 PM

Subject: ANKOS critics

Steve,

Sometime soon I’d like to get together and talk.

I’ve read a lot of your book.

Take a look at the draft of a little paper of mine (attached). I’d
appreciate comments.

Ed F

The following is my response someone else’s response [Gerry Sussman] to a review of ANKOS.

My comments are only with regard to Wolfram’s ideas on modeling physics.
I don’t happen to like his network model but we are in agreement that
some kind of discrete process might underlie QM.

Not everything Wolfram says is wrong.

…

The ideas that some kinds of discrete space-time processes (such as
CA’s) might underlie physics or other processes in nature is the BABY.
Everything else in ANKOS (or missing from ANKOS) is the BATH WATER.

Ed’s attached paper was basically yet another restatement of cellular automata as models of fundamental physics.

A few weeks later there was a strange (if in some ways charming) incident when a reporter for the San Francisco Chronicle decided to investigate what seemed to be a science feud between Ed and me. After a nod to medieval metaphysicians, the article (under the title “Cosmic Computer”) opens with “Nowadays, with a daring that might have dazzled St. Augustine and St. Thomas Aquinas, two titans of the computer world argue that everything in the universe is a kind of computer.” After analogizing me to Britney Spears, the article goes on to say “The excitement has also brought tension to the long-standing friendship between Wolfram and Fredkin, who are now wrestling with one of the bigger bummers of any scientist’s life: a dispute over originality.” The article reports: “Last week, the two men had a long, heartfelt phone conversation with each other, in which they tried to resolve their strong disagreement over priority. The conversation was amicable, but they failed to reach agreement.”

And so things remained until March 2003 when Ed sent the following:

Subject: Re: NKS 2003 Conference & Minicourse

Date: Thu, 20 Mar 2003 17:24:59 -0500

From: Edward Fredkin

To: Stephen Wolfram

Dear Stephen,

I guess I’m on a Wolfram mailing list for potential attendees for your
Boston conference. I hope you don’t mind a little plain speaking. I
consider that I am a friend of yours and therefor I take the risk of
telling the emperor about his new clothes. Of course, few others
would do so as a friend. Please don’t be offended as the plain talking
that follows is my attempt at trying to be constructive.

Your work is acquiring a reputation amongst the scientific community
that is much less than it deserves. I find myself often in the
position of defending you, your work and your accomplishments against
the negative views that many hold, even though they have little
understanding of the significance of what you have done. They are
turned off by your egregious behavior; it distracts much of the
scientific community rendering it barely possible for them to take you
seriously . You have invented and discovered quite a few things, but
so have others. You told me you would try to give credit in ANKOS
where credit was due; I believed you and I believe that you tried your
best but nevertheless you failed miserably. I guess you simply didn’t
know how. Consider this conference: Must this conference be a one man
show or might it actually be better for the ideas in ANKOS and better
for SW and his overall scientific reputation if it were a real
conference where others might address the same questions? Please don’t
kid yourself into thinking that no one else has anything original,
novel, important or interesting to say.

Of course, this so-called “…first ever conference” devoted to the “…
ideas and implications…” of concepts found in ANKOS might be nothing
more than a marketing tool for Mathematica and for sales of the ANKOS
book. If so, you ought to call a spade “spade”.

You’ve done enough things (and hopefully will continue to do so) to
ensure your reputation as a pioneer in various areas. This flood of
self puffery simply detracts, in the minds of many whose opinions you
ought to value, from the positive reputation you deserve.

I’m not one of those whose opinion of your work is in any way affected
by your unfortunate behavior. I see and understand exactly what you’ve
done and I know and understand what your work is based on. I am human,
so I find it interesting when you now and then claim to have discovered
an idea or fact that I personally explained to you when it was
perfectly clear at the time that to you, the idea was absolutely novel.
My model of you is that your overpowering motivation results in your
mind playing tricks on you. I really believe that you actually forget;
that you actually re-remember the past differently than it happened.

But I am the eternal optimist. I believe that even Stephen Wolfram
might someday come around and join the collegial scientific community
where you receive credit and give credit; both nearly effortlessly.
The world actually might voluntarily heap honors on you as opposed to
SW having to orchestrate “conferences” for the glorification of SW and
all the ideas claimed by SW. No one knows better than me how slow and
torturous this process can be for new and novel big concepts, but
patience and modestly [sic] still seems like the better path.

Please try to not be offended. I actually mean well. If you ever have
an actual, real conference, invite me to be a speaker; I’ll come. If I
organize another conference you can rest assured that you will be
invited again (as you were for the NSF Workshop) and I hope you will
come to talk about your ideas and maybe — maybe even stay to hear what
others have to say on the subject. It’s not too healthy to the
scientific mind to be the only real speaker at conferences you organize
and hype for yourself.

Among the very few who really are able to appreciate what you’ve done,
I am one of your greatest supporters. But I am not your average person
with more or less normal reactions. When you reach for extra glory
and credit by stealing one of my ideas, my reaction is: “I admire your
good taste”.

Best regards

Ed F

On Thursday, March 20, 2003, at 02:19 PM, Stephen Wolfram wrote:

> In June of this year we’re going to be holding the first-ever
> conference devoted to the ideas and implications of A NEW KIND OF
> SCIENCE. I think it’ll be an exciting and unique event. And if you’re
> interested in any facet of NKS or its implications, you should plan
> to come!
>
> I’ll be giving a series of in-depth lectures to explain the core
> ideas of NKS. There’ll be more specialized sessions exploring
> implications and applications in areas such as computer science,
> biology, social science, physics, mathematics, philosophy, and future
> of technology. And there’ll also be workshops and case studies about
> such issues as modelling, computer experimentation, defining NKS
> problems, NKS-based education–as well as a gallery of NKS-based
> art pieces.
>
> I’d expected that it’d be a few years before it would make sense to
> start having NKS conferences. But things have gone faster than I
> expected, and the enthusiasm and energy we’ve seen in the ten months
> since the book was published has made it clear that it’s time to have
> the first NKS conference.
>
> In planning NKS 2003, we want to cater to as broad a range of
> attendees as possible. There’ll be many professional scientists
> coming, as well as technologists and other researchers from a very
> wide range of fields. There’ll also be a large number of educators
> and students, as well as all sorts of individuals with general
> interests in the ideas and implications of NKS.
>
> We’ll be holding NKS 2003 near Boston over the weekend of June 27-29,
> 2003. There’s more information and registration details at
> http://www.wolframscience.com/nks2003
>
> It’s going to be an extremely stimulating weekend–and a unique
> opportunity to meet a broad cross-section of people interested in new
> ideas.
>
> I hope you’ll be able to be part of this pioneering event!
>
>
> — Stephen Wolfram

I responded:

Subject: Re: NKS 2003 Conference & Minicourse

Date: Sat, 22 Mar 2003 22:19:33 -0500

From: Stephen Wolfram

To: Edward Fredkin

Ed —

I must say that I am reluctant to respond to a note like the one below, but
it seems a pity to let things end this way.

I can tell you’re very angry … but beyond that I really can’t tell too
much.

I’d always thought we had a fine, largely social, relationship. We talked
about many kinds of things. It was fun. Occasionally we talked about
science. In the early 1980s I learned a few things about cellular automata
from you. None were extremely influential to me, but they were fine things
that you should be proud of having figured out—and in fact I took some
trouble to mention them in the notes to NKS.

You also told me some of your thinking about fundamental physics. I was (I
hope) polite, and tried to be helpful. But I always found what you were
saying quite vague and slippery—and when it became definite it usually
seemed very naive. I think it’s a great pity that you’ve never taken the
time to learn the technical details of physics as it’s currently practiced.
There’s a lot known. And if you understood it, I think you’d be able to
tell quite quickly which of your ideas are totally naive, and which might
actually be interesting.

I think it’s also a pity that—so far as I can tell—you’ve never really
taken the time to understand what I’ve done. It’s in the end pretty
nontrivial stuff. It’s not just saying something like “the universe is a
cellular automaton” or “I have a philosophy that the universe is like a
computer”. It’s a big and rich intellectual structure, built on a lot of
solid results and detailed, careful, analysis. That among other things
happens to give a bunch of ideas about how physics might actually
work—that have (so far as I know) almost nothing to do with things you’ve
been talking about.

I do agree with your belief that the universe is ultimately discrete. But
of course many people have for a long time said that they thought the
universe might at some level be discrete. Some of those people (like
Wheeler, Penrose, Finkelstein, etc.) are sophisticated physicists, and what
they’ve said has lots of real content—it’s not just vague essay-type
stuff. Now, I don’t happen to think what they’ve specifically proposed is
correct. But you would be completely wrong to think (as you seem to) that
somehow the idea that the universe might be discrete originated with you.

I really encourage you to read NKS in detail, including the notes at the
back. I think there’s a lot more there than you imagine. And I think if
you really understood it, you would be completely embarrassed to write a
note like the one below.

You’ve never struck me as being someone who is terribly interested in other
peoples’ ideas. And that’s of course fine. But you shouldn’t assume you
know their ideas just on the basis of a few buzzphrases or some such. In
some areas of business, that approach often works. Because, as we both
know, the ideas typically aren’t that deep. But it won’t work with a
character like me doing science. There’s too much nontrivial content. You
have to actually dig in to understand it. And from the things you say you
obviously haven’t.

For twenty years I thought we had a fine personal relationship. I thought
it was a little odd that you seemed to go around telling people that you had
introduced me to cellular automata. We talked about this a few times, and
you admitted this wasn’t a true story. But while I thought it was a little
unreasonable for you to keep on saying something you knew wasn’t true, I
didn’t pay much attention. It never really got in the way of our
relationship.

And then there was the incident of your NSF-funded conference. You invited
me. I said I couldn’t come. And suggested two alternates. You said fine.
But then you never contacted these people. Which was rather embarrassing
for me. And then, when David Reiss contacted you, you told him the
conference “was full”.

Later, when we talked about it, you admitted that that was a lie—and then
blamed the lie on Raj Reddy.

Frankly, I was flabbergasted by all this. That’s not the kind of
interaction someone like me expects to have with a seasoned high-level
operative like yourself. Yes, that’s the kind of thing some sleazy young
businessperson might do. But not a mature businessperson who has run
companies and things.

I still have no idea what you were thinking of. But it thoroughly shook my
confidence in you as someone I could interact straightforwardly with.

And then, of course, there’s the question of what you’ve said to journalists
etc. about NKS. In detail, I don’t have much idea. But something fishy
was surely going on. I haven’t gone and studied all the quotes from you.
But certainly my impression was that you were trying to claim that really
lots of key things in NKS were things you had done or said first.

You know that I tried to research the history carefully. And unless I
missed something quite huge, your contributions to NKS were extremely minor,
and are certainly accurately represented in the history notes. Now of
course if you don’t actually understand what I’ve done in NKS, that may be
hard to see. But I can’t really help you with that.

OK, where do we go from here?

We talked at some length when that reporter from a San Francisco paper was
trying to write a story about you and NKS. I thought we had a decent
conversation. But then, so far as I could tell, you went right ahead and
told the reporter—again—exactly a bunch of things we’d agreed in our
conversation weren’t true.

It was the same pattern as with telling people that you’d introduced me to
cellular automata. And it resonated in a bad way with the lie you told
about your conference.

I would have expected vastly better from you. I must say that I was
personally most disappointed. And I concluded with much regret that I must
have seriously misjudged you all these years.

I would like nothing more than to be able to mend our relationship, and go
back to the kind of pleasant social interactions we have always had.

How can that be achieved? Perhaps it’s impossible. But one step is that
you might actually try to understand what I’ve done in NKS. That would
surely help.

— Stephen

Subject: Delayed reply

Date: Thu, 3 Apr 2003 23:34:17 -0500

From: Edward Fredkin

To: Stephen Wolfram

Stephen,

I have been traveling and more recently have had my time gobbled up by
a most urgent matter.

I appreciate your quick reply to my email and I will get back to you
sometime soon. Rather than trying to respond to everything you brought up,
I will be limited to dealing with a couple of issues at a time.

What I can tell you is that I am not angry, and was not angry or upset.
I have always been a non-emotional observer with regard to whatever it is
that comes my way. That’s just my nature. It has come in handy at times,
such as when someone’s stupid mistake caused the single engine jet fighter
I was flying have an engine fire on take off. This required shutting down the
engine and taking other drastic actions very quickly; no time to get mad.

The gist of my comments to you was not related to the work you
documented in NKS, but rather to the style and methodology you are using
while trying to get people to understand and appreciate what it is you
have done. I certainly agree with the fact that it is extraordinarily
difficult to get the scientific establishment to pay attention, listen,
understand and appreciate what you’ve done. Nevertheless, I think there
might be a better approach to that problem than the one you are following.

So, as soon as I can get a little breathing room, I’ll respond to some
of your comments. I do value our friendship and whatever I do in this regard
will be an attempt at honest and unemotional communication with the goal of
some better mutual understanding.

By the way, I have taken the time to read and understand what you’ve
done in NKS. I’m pretty sure that I am better able than most to appreciate
the effort, persistence and creativity that went into that work.

You have made some comments about me and my own work, and I wonder what
you actually know about it beyond our conversations and the things you
referenced in NKS.

As soon as I can get some time I’ll continue with some further thoughts.

Best regards,

Ed didn’t send the promised followup. But a couple of months later New Scientist sent our media email address a note titled: “cover feature on Fredkin, Wolfram right to reply”, which asked for “comment on the suggestion that you first became familiar with cellular automata first at Fredkin’s lab in the 1970s and that examples in A New Kind of Science came out of work done in the lab”. I told Ed he should correct that—and he responded to me:

Subject: Re: cover feature on Fredkin, Wolfram right to reply.

Date: Thu, 29 May 2003 13:55:02 -0400

From: Ed Fredkin

To: Stephen Wolfram

Stephen,

I carefully and clearly told the author of the NS article that to my
knowledge it is not true “… that Wolfram first became familiar with
cellular automata at Fredkin’s lab in the ’70’s…” and further that
you already knew about CA’s.

My guess is that magazines see value in controversy and they would like
to attribute statements to each of us that helps them titillate their
readers. I tried in every way I could to correct any wrong impressions
the author had. But what they end up doing is beyond my control.

As to cracking the fundamental theory of physics, I did read and did
understand what you wrote about in NKS, however my interests lie in
models that are regular and based on a simple underlying Cartesian
lattice. The models I have been working on for the past few years are
called “Salt” as they are CA’s similar to an NaCl crystal. You can
read about it at www.digitalphilosophy.org.

My approach to being consistent with QM, SR and GR is related to the
fact that CA models of physics can exactly conserve such quantities as
momentum, energy, charge etc. By means of a variant of Noether’s
Theorem, the physics of such CA’s can exhibited the all the symmetries
we currently attribute to physics, but doing so asymptotically at
scales above the lattice.

Thus, in my concept of a theory of physics, translation symmetry,
rotation symmetry etc. would all be violated as we currently understand
is true for time symmetry, parity symmetry and charge symmetry .

No one suggests that you should agree with all my ideas, however your
comment in your prior email to me is unnecessarily condescending:

> “I think it’s a great pity that you’ve never taken
> the time to learn the technical details of physics as it’s
> currently practiced. There’s a lot known. And if you understood
> it, I think you’d be able to tell quite quickly which of
> your ideas are totally naive, and which might actually be interesting.”

What is certain is that there’s no “great pity” necessary. I actually
do know a lot about the technical details of physics. In any case,
thirty years ago Feynman thought that I needed to learn more about
certain aspects of QM. He was specific in what he felt was everything
more that I needed to know (in order to make progress with my CA
ideas). He offered to work with me, which was accomplished during the
course of the year I spent at Caltech (1974-1975). I studied, learned
more about QM and passed the final exam that Feynman gave me. While
we argued a lot, Feynman never accused me of having naive ideas.

As to NKS 2003, it doesn’t make a lot of sense for me to come to be a
member of the audience. If you would like me to participate in some
meaningful way, let me know.

Best regards,

Ed F

And after that exchange, Ed and I basically went back to being as we had been before—having pleasant interactions, without any particular scientific engagement. And in a sense for many years I kept out of Ed’s scientific way—not seriously working on physics again until 2019.

Since 2002 I’d been living in the Boston area, so Ed and I ran into each other more often. And although Ed’s behavior over A New Kind of Science had disappointed and upset me, it gave me a better understanding of Ed as a human being, and a vulnerable one at that.

The Later Ed

It was always a little hard to tell just what was going on with Ed. In July 2003, for example, he wrote to me:

Subject: Gunkel

Date: Thu, 24 Jul 2003 19:07:56 -0400

From: Ed Fredkin

To: Stephen Wolfram

Stephen,

First I must apologize for this long letter. Pat Gunkel sent me an
email telling of your visit. It prompted me (who hardly ever writes
anything) to type up my thoughts for whatever they’re worth.

…

You might be surprised at the number of wise and intelligent people who
really appreciate Pat and his works. Yet after more than 30 years of
fitful, diverse yet nearly continuous support, Pat has come to a
situation that, to him, looks like the end of the line.

…

I, unfortunately, am no longer in a position to personally provide the
kind of modest support that Pat needs to continue his church-mouse kind
of existence.

…

There is no doubt that Pat can be a difficult person to help, but I
notice that he has mellowed with age. Of course, Wolfram Research is
not a charitable institution. But I believe that Pat’s ideas on
ideonomy are really important and that those ideas may form the basis
of interesting future applications. The point of all this is that if
what Pat is doing seems interesting to you, some arrangement with
Wolfram Research might make sense.

(True to form, Gunkel followed up with a very forthright note, including a scathing critique he’d written of A New Kind of Science—as well as of Ed’s theories. That wouldn’t have deterred me, but I couldn’t see anything Gunkel could actually do for us, so I never pursued this.)

But did Ed’s note imply that Ed was running out of money? I’d always assumed some kind of vast business empire lurking in the background, but now I wasn’t sure.

I saw Ed only a few times in the next couple of years—at events like a Festschrift for Sulak and a bat mitzvah for one of Feynman’s granddaughters. But as usual, he was eager to tell stories, some of which I hadn’t heard before—mostly about things far in the past. He said that in the early 1960s John Cocke had stolen the idea of RISC architecture from his murdered friend Ben Gurley, though it had taken him two decades to get it taken seriously. He said that around the same time he’d been pulled in by the Air Force to help with analysis of blast waves from nuclear tests (and that story came with descriptions of B-52s doing loop-the-loop maneuvers when they dropped atomic bombs). He said that he’d once demoed the Muse music system (which, he emphasized, he, not Minsky, had invented) to an astonished audience in the Soviet Union. He said that he’d advised Richard Branson on his transatlantic balloon trip, telling him his butane burners weren’t correctly mounted—and in fact they fell off. And so on.

In 2005 Ed told me he’d been working with a programmer in California named Dan Miller (who’d developed audio compression software [and been at the NKS 2003 conference that Ed had been so upset about]) on the new 3D cellular automaton he’d invented that he called the “SALT architecture” because its pattern of updates were like the Na and Cl in a salt crystal.

But then in 2008 Ed told me he’d sold his island—presumably relieving whatever financial issues he’d had before—and suddenly Ed started to show up much more. He told me (as he did quite a few times) that he was working on a book (which never materialized). He told me he was teaching a course at Carnegie Mellon on the “Physics of Theoretical Computation”—which was apparently actually a very-much-as-before “engineering-style” effort to explore building features of physics from a cellular automaton, now with his SALT architecture. He invited me to a dinner at his house in honor of ‘t Hooft, photographed here with Ed, me and Sulak:

That fall, Ed came to the Midwest NKS Conference in Indiana, here photographed in a discussion with Greg Chaitin, me and others:

I would interact with Ed quite regularly after that—most often with him telling me about his use of Mathematica and soon Wolfram|Alpha. In 2012 Ed—now aged 78—sent me a nice “I have an idea” email (I made the requested introduction, though I’m not sure if this ever went anywhere):

Subject: Alpha and Problem Solving

Date: Fri, 19 Oct 2012 20:12:01 +0000

From: Edward Fredkin

To: Steve Wolfram

Steve,

The first thing I taught at MIT was a course in general problem solving (in 1968).
I’m now developing a new course on General Problem Solving which I expect to
offer first at Harvard’s HILR program. Part of the motivation came from watching
Joyce struggle with a Harvard course on Chemistry, where a lot of the homework
involved units conversions. I noticed that Alpha promptly solved many of Joyce’s
homework problems including some involving chemical reactions. (The course
was really for students planning to take the MCAT Exam in order to get into Medical
School). One clue that you might give to the Alpha developers, is to work toward
getting Alpha to have more of the capabilities necessary to pass different standard
tests that involve various kinds of quantitative analysis. (Of course, you might
have already done so.)

You might recall that I discussed the issue of units conversion with you long ago
(before Mathematica), and you described the idea you then had that turned into
Convert in Mathematica.

In any case, Alpha is fantastic, and getting better all the time. My plan is that every
one of my students must use Alpha for every problem that involves numbers, along
with some that don’t involve numbers. My motto is John McCarthy’s dictum:
“Those who refuse to do arithmetic are doomed to talk nonsense!” However, with
Alpha, the problem solver doesn’t have to do the arithmetic or the units
conversions; Alpha can do it!

It would be helpful if I could get a little bit of cooperation from someone in the Alpha
group. Basically, I will want to talk to an Alpha expert from time to time to make sure
I’m taking advantage of the best that Alpha can do along with resources already
developed for introducing Alpha to new users. My initial students will be drawn from
a group of retirees who, while clearly above average in intelligence, may have few
recently used skills in mathematics. I also expect that almost all of my initial
students will be first time Alpha users. Again, I might profit from discussion
with someone who has thought about how to introduce Alpha to beginners.

Let me know what you think or, if you like, we could get together to talk about it.

Best regards and Congratulations!

In 2014, when I recorded some oral history with Ed—now age 80—he was again brimming with ideas. The one he was most excited about had to do with weather prediction. It started from the observation that most smartphones have pressure sensors in them. Ed’s idea was to use these—and more—to create a sensor net that would continuously collect billions of pressure measurements, to be fed as input to weather forecast codes. Channeling his lifelong interest in reversible computing he imagined that the codes could be made reversible, and that running backwards from an incorrect prediction could tell one where more data had to be collected. Then Ed imagined doing this by having tiny balloons all over the place—with nothing that would cause trouble if a plane ran into it. He had a whole plan for partners he wanted to get (and, yes, he wanted us to be part of it too). And in typical Ed fashion, it was all laced with stories:

You know, I had this personal experience with weather. I was flying a glider along at 16,000 feet, and I encountered sink. You know, sink is wind blowing down. And the speed of the sink was 10,000 feet a minute. I was at 16,000 feet. And two minutes later, I was on the ground landing. Not on purpose. You know my attitude was—if I don’t see a big grading on the ground—[the wind] can’t keep going this way all the way down, so I won’t be killed. Actually, in that same storm, one of the pilots was killed.

…

The weather people just aren’t into the vertical movement of air. They do everything in layers. But this went through a lot of layers all at once in an organized fashion. So the point is that to talk about thousands or even millions of sensors makes no sense. You’re not going to do good weather until you get billions of sensors. That’s my opinion.

We talked about whether sensitive dependence on initial conditions destroys all predictability in fluid dynamics. I have theoretical and computational reasons to think it doesn’t. But Ed had a story:

There’s a mountain in California I happen to know, and I have a picture of a cloud street that starts on that mountain because it has a very peculiar geometry, and then runs for 2,000 miles.

So this particular mountain has an area of its rock that faces towards the east and it’s big. And what happens is when the Sun is shining on that and the wet wind is coming from the Pacific and so on, you get this big cumulus cloud that flows back this way, and then you get another one and it pulses. You get one after another. And these are very stable things and they travel a very long way. So my point is that amidst all the randomness there’s a lot of order that can be found and understood. There are regions that have funny properties. They’re much more temperature stable. There’s like islands of stability. And things like that get ignored by everything people are doing today, you know what I mean?

I would send things I’d written to Ed. I didn’t really think he’d read them. But I thought he might at least enjoy their concepts. And often he would respond with ideas of his own. I sent him an announcement about our Tweet-a-Program project (now reconfigured because of Twitter changes) with the one-line comment (reflecting his “best programmer” self-characterization): “A new frontier of programming prowess?” He responded, in typical Ed fashion, with an idea—that’s actually a little reminiscent of modern AI image generation:

Subject: Re: Tweet-a-Program

Date: Fri, 19 Sep 2014 21:25:47 +0000

From: Edward Fredkin

To: Stephen Wolfram

Hi,

I like it! As usual, it gave me ideas that might be outside of your
current concept.

We should talk sometime, so that I can explain something closely related
to [Tweet-a-Program] but decidedly different and perhaps even more fun.
Strangely, it has to do with Haiku.

What I have figured out is that there could be a new kind of Haiku, where
the text is interpreted by Mathematica to generate an image.
…
The trick will be having the image reflect something of the Haiku meaning,
even if only abstractly. I don’t know how to do this so that it does the perfect
thing every time, but I have thought of something that could be fun, and a
person could become skilled at creating Mathematica Haikus that seem to reflect
some aspects of the feeling of the words in an image with some increasing
probability of doing it well, as a result of practice.

…

Late in 2014 Ed sent me another piece of mail saying he was starting a project to produce a “new cellular automaton system”—and he wanted to use our technology to do it. He also sent me a paper he’d written about his SALT cellular automaton:

Finally—and without my help—Ed seemed to have mastered the art of academic papers. This one was on the arXiv preprint server. Others—with titles like “An Introduction to Digital Philosophy”—had appeared in academic journals. (Ones with titles like “A New Cosmogony” and “Finite Nature ” were more privately circulated.) But what most struck me about this particular paper was that—for the first time—it seemed to have actual images of cellular automaton behavior. Ever since those few minutes with the PERQ computer on Ed’s island in 1982 I hadn’t seen Ed ever show anything like that. And now Ed was again chasing that old question Minsky had asked, of making a circle with a cellular automaton.

At the time, I didn’t have a chance to see what Ed had actually done, and whether he’d finally solved it. But in writing this piece, I decided I’d better try to find out. The actual rule—that Ed and Dan Miller called “BusyBoxes”—is quite complicated, involving knight’s-move neighborhoods, etc. Their claim was that starting with a string of cells in a particular configuration, the average of their positions would trace out what in the limit of a long string would be a circle:

At first it looks like a kind of magic trick (and no, nothing is bouncing off any “walls”; the direction changes are just a consequence of the initial pattern of cells). But if you keep all the locations that get visited, things start to seem less mysterious—because what you realize is that the “basket” that gets “woven” is actually just a cube, viewed from a corner:

Where does the apparent circle come from? The details are a bit complicated—and I’ve put them in an appendix below. But suffice it say to that Ed’s old nemesis—calculus—comes in very handy. And in fact it lets one show that although one gets almost a circle, it’s not quite a circle; even with an infinite string, its radius is still wiggling by about 0.5% as one goes around the “circle”:

And—as we’ll see below—remarkably enough one can get a closed-form result for the amount of wiggliness (here computed as the ratio of maximum to minimum radius):

In earlier years, Ed might have tried to say that generating a circle (which this doesn’t) was tantamount to showing that a cellular automaton could reproduce physics. But by now I think he realized that it was really much more complicated than that. And he wasn’t mentioning physics much to me anymore. But—perhaps not least because many of his longtime interlocutors had by then died—he was interacting with me more than before. And perhaps he was even beginning to think that I might have a bit more to contribute than he’d assumed.

In December 2015 I sent Ed a piece I’d written to celebrate the bicentenary of Ada Lovelace, and he responded:

Date: Fri, 11 Dec 2015 15:58:14 +0000

From: Edward Fredkin

To: Stephen Wolfram

Stephen,

I was truly blown away by your essay re Ada Lovelace! You’ve got a lot
more to give the world than I had imagined, and I, more than anyone else,
appreciate what you might still be capable of accomplishing.

It’s too bad that some persons at MIT, for far too long, hung onto one
dimensional views focussed on what Macsyma might have been. My own
impressions have always been different, I recognized your potential long
ago and consequently invited you to one of my Mosqito Island conferences
some 3.5 decades ago.

In any case, much of what Mathematica makes possible is very important and
valuable to me. As you know I was an early user and continue to be a
user.

Many of my interests have run along many paths opened up by activities you
have instigated at Wolfram. Wolfram Alpha and its connections to Siri,
are examples.

Your new book “An Elementary Introduction to the Wolfram Language” (I
don’t yet have a hard copy) fits in with a project I had in mind for my
grandchild Robert, who at age 6 already seems to be extraordinarily
talented mathematically.

To cut to the chase, I want to make a proposal: Although I’m too old to
be a regular employee, I’d nevertheless like to have an association with
Wolfram, where I might be able to contribute ideas, and solve problems
(I’m still quite good at that).

I won’t need much from you other than your opening the door to my
involvement at Wolfram. What I have in mind would be an arrangement
where I could work for Wolfram, with some kind of arrangement other than
full time employment.

I’ve attached something I wrote recently.

Gosh! That was an unexpected development. Flattering, I suppose. But my main reaction was a kind of sadness. Yes, after all these years, Ed had finally read something I’d written. But somehow his response sounded like he was surrendering. This wasn’t the “I-want-to-do-everything-for-myself” Ed I had known all this time. This was an Ed who somehow felt he needed us to support him. And while our company has been able to absorb a great many “unusual” people—with terrific success—Ed seemed like he was pretty far outside our envelope.

At the time, I didn’t look at the attachment Ed sent with his email. But opening it now adds to my sense of sadness. It was a 13-page document about a system Ed imagined that would help people with “various forms of cognitive disabilities”, including a section on “Dementia and Alzheimer’s”:

It wasn’t until 2017 that Ed explicitly mentioned to me that his short-term memory was failing—though in talking to him it had been increasingly obvious for several years. He said he’d joined a group of people who were writing their memoirs. I told him I’d look forward to seeing his, though I’m not sure he ever made much progress on them.

Ed continued to send me ideas and proposals. There was a very Ed-like “global idea” about creating a system “GM” (presumably for “General Mathematician”) that would effectively “learn all of mathematics” by automatically reading math books, etc. (yes, definite overtones of what’s happening with LLM-meets-Wolfram-Language):

Later there were several pieces of mail about a new idea for factoring integers. In the first of them (from 2016), Ed told me that when the NeXT computer first came out (in 1989) he’d used Mathematica on it to simulate a reversible hardware multiplier. And being reminded of this by a historical piece I’d written, he said it had “started me thinking, again, about that problem and I had a new insight that appears to so greatly reduce the complexity of a reversible multiplier so as to possibly make it better at factoring large integers than current algorithms.” He wrote me about this several more times, suggesting various kinds of collaborations. Finally, in 2018 he told me how the method worked, saying it involved doing reversible arithmetic using balanced ternary. (Strangely enough, years earlier Ed had told me about Soviet computers that also used balanced ternary.)

I think that was the last technical conversation I had with Ed. A couple of years later I sent him the book about our Physics Project with the inscription:

And I would see him at least once a year at the Boston-area physics get-together organized by Boston University. He would always tell me stories. Often the same stories, and sometimes stories about me. And indeed as I was writing this piece I actually found a video Ed made in November 2020 that has such a story, albeit by this point seriously muddled (and, no, I’ve basically never “run” a cellular automaton by hand in my life!):

I used to organize meetings in the Caribbean and I did this because I had an island in the Caribbean … I invited Wolfram to come down. Wolfram had done pioneering work in cellular automata. … He was a great guy, you know, and I wanted him to get on the bandwagon … He shows up at the meeting and he had done all his work by hand as had everyone else in cellular automata. He didn’t think of using a computer. [!] I had a display processor that I modified to be able to run a cellular automaton with the stuff that it used to put text up on the screen. And so I’m showing him a cellular automata running at 60 frames a second continuously like a movie. This was 10,000 times faster than doing it by hand which is what he’d always done. He never thought of using a computer to do cellular automata and he turns around and walks out and and he left the island and went back to someplace else. So [later] I went to his meeting at Los Alamos and I ran into him again and he was now doing computer work. And I said to him “How come in all your work you don’t have a reversible [rule]”, and he says to me “Oh, reversible ones are all trivial”. And I went up and this is the most telling thing about his intellect: he’s a very smart guy [and when I] showed him how he could change his rule slightly and make it reversible his eyes just about popped out of his head and he knew I was correct.

I may have introduced him to this field but what he has done is he is far better than I at getting other people involved. I’ve never bothered and I don’t have the talent that he has for that. What he did was he came up with similar ideas and initially he didn’t give me the credit I thought I deserved. But it became apparent to me that he did this independently and he’s better at writing things and better at hiring bright people who can do things than I ever was.

And right after that, Ed ends the video with:

As I look back on my career I’ve had a fantastic life and I’m not unhappy about any aspect of it because, you know, I’ve accomplished everything I might have done and in spite of various handicaps—like not being a writer—I still have done a lot and the world, uh, understands me, I think, and appreciates what I’ve done.

When I saw Ed in 2022 he wasn’t able to say much. But, though it was a struggle, he was keen to make one point to me, that seemed to matter a lot to him: “You’ve managed to get people to follow you”, he said “I was never able to do that”. I saw Ed one last time this May. Joyce explained that Ed had “bumped his head”, and, in a very Ed-like way, she was avoiding a repeat by getting him to wear a bike helmet. She wanted someone to snap a picture of me and her with Ed:

Six weeks later, Ed died, at the age of 88.

I went to see Joyce and Rick a few weeks later, among other things to check facts for this piece. I’d heard from Ed that his ancestors had provided wood for the imperial palace in St. Petersburg. But I’d also heard from someone else that Ed had said he was descended from Mongolian royalty. And as I was about to leave, I thought I might as well ask. “Oh yes”, they said. “And Ed’s father even wrote a historical novel about it”. And they showed me two books (both from the mid-1980s):

I’m not sure who Sarah, Queen of Mongolia was, but the book blurb claims that Ed’s father was her great-great-great-grandson—and goes on to speak of the “strong family inheritance of a mind that analyzes not only the injustice of human oppression but offers realistic and beneficial solutions”.

Summing Up Ed

“Can that really be true?” I often asked myself when hearing yet another of Ed’s implausible stories. And of course it didn’t help that stories he told—even to me—about me weren’t true. But the remarkable thing in writing this piece is that I’ve been able to verify that a lot of Ed’s stories—implausible though they may have sounded—were in fact true. Yes, they were often embellished, and parts that didn’t reflect so well on Ed were omitted. But together they defined a remarkable tapestry of a life.

It was in many ways a very independent life. Ed had friends and family members to whom he stayed close throughout his life. But mostly it was “Ed for himself, against the world”. He didn’t want to learn anything from anyone else; he wanted to figure out everything for himself. He wanted to invent his own ideas; he wasn’t too interested in other people’s. In a rather Air-Force-pilot kind of way (“eject or not?”) he liked to be decisive—and he liked to be incisive too, always figuring out a clear, simple thing to say. Sometimes that came across as naive. And sometimes it was in fact naive. But mostly Ed didn’t seem to mind much; he would just go on to another idea.

Ed was a great storyteller, and an engaging speaker. For some reason he developed the theory that he couldn’t write—but there’s ample evidence, going back even to his teenage years, that this wasn’t true. If there was a problem, it was with content, not writing. And the issue with the content was that it tended to just be too Ed-specific—too insular—and not connected enough for other people to be able to understand or appreciate it.

I don’t know what Ed was like as a manager; I rather suspect he may have suffered from trying to be a bit too clever, with too many ideas and too much gamification. In the end, he felt he’d failed as a leader, and perhaps that was inevitable given how independent he always wanted to be. Despite his stints as an academic administrator and as a CEO, Ed was in the end fundamentally a lone warrior (and problem solver), not a general.

And what about all those ideas? Most never developed very far. Some were pretty wild. But many had at least a kernel of visionary insight. The details of the universe as a cellular automaton didn’t make sense. But the idea that the universe is somehow computational is surely correct. And spread over the course of more than six decades, Ed spun out nuggets of ideas that would later appear—usually much more developed—in a remarkable range of areas.

Ed projected a kind of personal serenity—yet he was in many ways deeply competitive. Most of the time, though, he was able to define the arena of his competitiveness so idiosyncratically that there really weren’t other contenders. And I think in the end Ed felt pretty good about all the things he’d managed to do in his life. It was fitting that he owned an actual island. Because somehow an island was a metaphor for Ed’s life: separate, independent and unique.

Thanks

I’ve had help with information for this piece from many people, including Joyce Fredkin, Rick Fredkin, Simson Garfinkel, Andrea Gerlach, Bill Gosper, Howard Gutowitz, Steven Levy, Norm Margolus, Margaret Minsky, Dave Moon, John Moussouris, Mark Nahabedian, Walter Parkes, David Reiss, Brian Silverman, George Sulak, Larry Sulak and Matthew Szudzik. (Tom Toffoli agreed to talk, but didn’t show up.) I thank the Department of Distinctive Collections at the MIT Library for access to the Fredkin papers archive there. Thanks also to Brad Klee and Nik Murzin for technical help.

Appendix: Analyzing the Not-Quite-Circle

Here’s what the SALT cellular automaton does for two sizes of initial “string”:

For an initial string of length n (with n > 2), the overall period is 54n – 43, and the envelope “woven” going through all configurations is:

The “circle” is obtained by averaging the positions of all cells present at a given time step. The “circle” is always planar, but its effective radius varies with direction (i.e. as the system steps through each cycle):

Ed and Dan Miller looked at the standard deviation of the effective radius as a function of n, computing it up to n = 20, and getting the following results:

It looked as if the standard deviation was just going to go smoothly to zero—so that for an infinite string one would get a perfect circle. But that turns out not to be true, as one can see by extending the computation to slightly larger values of n:

And actually there’s a minimum at n = 43, with standard deviation 0.0012 (and fractional size discrepancy 0.0048)—and it doesn’t look like even for n ∞ one will get a perfect circle.

But how can one work out the n ∞ case? It’s actually a nice application for calculus.

First, notice that the “basket” consists of a series of layers of a cube viewed from one of its corners, or in other words a sequence of shapes like this:

Here’s how these are formed as one sweeps through the cube:

One can think of the string in the cellular automaton as spanning these “layers”, and successively moving around all of them as the cellular automaton evolves. In the continuum limit, there’s effectively a parameter t that defines where on each “layer curve” one is at a particular time. Conveniently enough, the length of all the layer curves is the same (for a unit cube it is 3 ≈ 4.2). With successive layers parametrized by a variable s (running from 0 to 1) the corners of the layer curves (all normalized to have length 1) are given by:

Now we need to find the actual x, y positions of string elements (AKA infinitesimal cells) as a function of s and t. Since the edges of the layer polygons are always straight, in each of a series of “piecewise regions” in s and t (with breakpoints defined by the corners of the polygons), we get expressions for x and y that are linear in s and t:

One subtlety is that the string in essence turns as time progresses, so that it effectively samples a different t value for different layers s. To correct for this, we have to find for which t we get x = 0 for a given s. It’s convenient to put the center of all our layer curves at {0, 0}, and we can do this now by subtracting . Then the (first) value of t at which x = 0 is given simply by:

The parametric surface we now get as a function of t is (with discrete lines indicating particular values of s):

Now we can slice the parametric surface not in discrete s values but instead in discrete t values—thus getting what’s basically a sequence of effective strings at discrete times:

The centroids of the strings are indicated in green, and these are then points on our potential circle. Using what we did above, the radius of this “circle” as a function of t can then be found by integrating over s. The result is algebraically complicated, but has a closed form:

Integrating this over t we get the “average radius”, normalized to “circumference 1” from the fact that t varies from 0 to 1 going “around the circle”:

(This means that the “effective π” for this circle is about 3.437.)

Now we can plot the “wiggle” of the radius as a function of “angle” (i.e. t):

It looks a bit like a sine curve, but it’s not one. And, for example, it isn’t even symmetrical. Its maxima (which occur at odd multiples of 30°) are

while its minima (at even multiples of 30°) are

and dividing by the average radius these are about 1.00734 and 0.992175.

The ratio of maximum to minimum (effectively “wiggle amplitude”) is:

Meanwhile, the standard deviation can be obtained as an integral over t, and the final result is

which is about 2.4 times larger than what we get at n = 100. We can see the approach to the asymptotic value by computing integrals over t for progressively larger numbers of discrete values of s (which, we should emphasize, is similar to values of n, but not quite the same, particularly for small n):

Generative AI Space and the Mental Imagery of Alien Minds

Stephen Wolfram — Mon, 17 Jul 2023 20:47:56 +0000

Click on any image in this post to copy the code that produced it and generate the output on your own computer in a Wolfram notebook.

AIs and Alien Minds

How do alien minds perceive the world? It’s an old and oft-debated question in philosophy. And it now turns out to also be a question that rises to prominence in connection with the concept of the ruliad that’s emerged from our Wolfram Physics Project.

I’ve wondered about alien minds for a long time—and tried all sorts of ways to imagine what it might be like to see things from their point of view. But in the past I’ve never really had a way to build my intuition about it. That is, until now. So, what’s changed? It’s AI. Because in AI we finally have an accessible form of alien mind.

We typically go to a lot of trouble to train our AIs to produce results that are like we humans would do. But what if we take a human-aligned AI, and modify it? Well, then we get something that’s in effect an alien AI—an AI aligned not with us humans, but with an alien mind.

So how can we see what such an alien AI—or alien mind—is “thinking”? A convenient way is to try to capture its “mental imagery”: the image it forms in its “mind’s eye”. Let’s say we use a typical generative AI to go from a description in human language—like “a cat in a party hat”—to a generated image:

It’s exactly the kind of image we’d expect—which isn’t surprising, because it comes from a generative AI that’s trained to “do as we would”. But now let’s imagine taking the neural net that implements this generative AI, and modifying its insides—say by resetting weights that appear in its neural net.

By doing this we’re in effect going from a human-aligned neural net to some kind of “alien” one. But this “alien” neural net will still produce some kind of image—because that’s what a neural net like this does. But what will the image be? Well, in effect, it’s showing us the mental imagery of the “alien mind” associated with the modified neural net.

But what does it actually look like? Well, here’s a sequence obtained by progressively modifying the neural net—in effect making it “progressively more alien”:

At the beginning it’s still a very recognizable picture of “a cat in a party hat”. But it soon becomes more and more alien: the mental image in effect diverges further from the human one—until it no longer “looks like a cat”, and in the end looks, at least to us, rather random.

There are many details of how this works that we’ll be discussing below. But what’s important is that—by studying the effects of changing the neural net—we now have a systematic “experimental” platform for probing at least one kind of “alien mind”. We can think of what we’re doing as a kind of “artificial neuroscience”, probing not actual human brains, but neural net analogs of them.

And we’ll see many parallels to neuroscience experiments. For example, we’ll often be “knocking out” particular parts of our “neural net brain”, a little like how injuries such as strokes can knock out parts of a human brain. But we know that when a human brain suffers a stroke, this can lead to phenomena like “hemispatial neglect”, in which a stroke victim asked to draw a clock will end up drawing just one side of the clock—a little like the pictures of cats “degrade” when parts of the “neural net brain” are knocked out.

Of course, there are many differences between real brains and artificial neural nets. But most of the core phenomena we’ll observe here seem robust and fundamental enough that we can expect them to span very different kinds of “brains”—human, artificial and alien. And the result is that we can begin to build up intuition about what the worlds of different—and alien—minds can be like.

Generating Images with AIs

How does an AI manage to create a picture, say of a cat in a party hat? Well, the AI has to be trained on “what makes a reasonable picture”—and how to determine what a picture is of. Then in some sense what the AI does is to start generating “reasonable” pictures at random, in effect continually checking what the picture it’s generating seems to be “of”, and tweaking it to guide it towards being a picture of what one wants.

So what counts as a “reasonable picture”? If one looks at billions of pictures—say on the web—there are lots of regularities. For example, the pixels aren’t random; nearby ones are usually highly correlated. If there’s a face, it’s usually more or less symmetrical. It’s more common to have blue at the top of a picture, and green at the bottom. And so on. And the important technological point is that it turns out to be possible to use a neural network to capture regularities in images, and to generate random images that exhibit them.

Here are some examples of “random images” generated in this way:

And the idea is that these images—while each is “random” in its specifics—will in general follow the “statistics” of the billions of images from the web on which the neural network has been “trained”. We’ll be talking more about images like these later. But for now suffice it to say that while some may just look like abstract patterns, others seem to contain things like landscapes, human forms, etc. And what’s notable is that none just look like “random arrays of pixels”; they all show some kind of “structure”. And, yes, given that they’ve been trained from pictures on the web, it’s not too surprising that the “structure” sometimes includes things like human forms.

But, OK, let’s say we specifically want a picture of a cat in a party hat. From all of the almost infinitely large number of possible “well-structured” random images we might generate, how do we get one that’s of a cat in a party hat? Well, a first question is: how would we know if we’ve succeeded? As humans, we could just look and see what our image is of. But it turns out we can also train a neural net to do this (and, no, it doesn’t always get it exactly right):

How is the neural net trained? The basic idea is to take billions of images—say from the web—for which corresponding captions have been provided. Then one progressively tweaks the parameters of the neural net to make it reproduce these captions when it’s fed the corresponding images. But the critical point is the neural net turns out to do more: it also successfully produces “reasonable” captions for images it’s never seen before. What does “reasonable” mean? Operationally, it means captions that are similar to what we humans might assign. And, yes, it’s far from obvious that a computationally constructed neural net will behave at all like us humans, and the fact that it does is presumably telling us fundamental things about how human brains work.

But for now what’s important is that we can use this captioning capability to progressively guide images we produce towards what we want. Start from “pure randomness”. Then try to “structure the randomness” to make a “reasonable” picture, but at every step see in effect “what the caption would be”. And try to “go in a direction” that “leads towards” a picture with the caption we want. Or, in other words, progressively try to get to a picture that’s of what we want.

The way this is set up in practice, one starts from an array of random pixels, then iteratively forms the picture one wants:

Different initial arrays lead to different final pictures—though if everything works correctly, the final pictures will all be of “what one asked for”, in this case a cat in a party hat (and, yes, there are a few “glitches”):

We don’t know how mental images are formed in human brains. But it seems conceivable that the process is not too different. And that in effect as we’re trying to “conjure up a reasonable image”, we’re continually checking if it’s aligned with what we want—so that, for example, if our checking process is impaired we can end up with a different image, as in hemispatial neglect.

The Notion of Interconcept Space

That everything can ultimately be represented in terms of digital data is foundational to the whole computational paradigm. But the effectiveness of neural nets relies on the slightly different idea that it’s useful to treat at least many kinds of things as being characterized by arrays of real numbers. In the end one might extract from a neural net that’s giving captions to images the word “cat”. But inside the neural net it’ll operate with arrays of numbers that correspond in some fairly abstract way to the image you’ve given, and the textual caption it’ll finally produce.

And in general neural nets can typically be thought of as associating “feature vectors” with things—whether those things are images, text, or anything else. But whereas words like “cat” and “dog” are discrete, the feature vectors associated with them just contain collections of real numbers. And this means that we can think of a whole space of possibilities, with “cat” and “dog” just corresponding to two specific points.

So what’s out there in that space of possibilities? For the feature vectors we typically deal with in practice the space is many-thousand-dimensional. But we can for example look at the (nominally straight) line from the “dog point” to the “cat point” in this space, and even generate sample images of what comes between:

And, yes, if we want to, we can keep going “beyond cat”—and pretty soon things start becoming quite weird:

We can also do things like look at the line from a plane to a cat—and, yes, there’s strange stuff in there (wings hat ears?):

What about elsewhere? For example, what happens “around” our standard “cat in a party hat”? With the particular setup we’re using, there’s a 2304-dimensional space of possibilities. But as an example, we look at what we get on a particular 2D plane through the “standard cat” point:

Our “standard cat” is in the middle. But as we move away from the “standard cat” point, progressively weirder things happen. For a while there are recognizable (if perhaps demonic) cats to be seen. But soon there isn’t much “catness” in evidence—though sometimes hats do remain (in what we might characterize as an “all hat, no cat” situation, reminiscent of the Texan “all hat, no cattle”).

How about if we pick other planes through the standard cat point? All sorts of images appear:

But the fundamental story is always the same: there’s a kind of “cat island”, beyond which there are weird and only vaguely cat-related images—encircled by an “ocean” of what seem like purely abstract patterns with no obvious cat connection. And in general the picture that emerges is that in the immense space of possible “statistically reasonable” images, there are islands dotted around that correspond to “linguistically describable concepts”—like cats in party hats.

The islands normally seem to be roughly “spherical”, in the sense that they extend about the same nominal distance in every direction. But relative to the whole space, each island is absolutely tiny—something like perhaps a fraction 2^–2000 ≈ 10^–600 of the volume of the whole space. And between these islands there lie huge expanses of what we might call “interconcept space”.

What’s out there in interconcept space? It’s full of images that are “statistically reasonable” based on the images we humans have put on the web, etc.—but aren’t of things we humans have come up with words for. It’s as if in developing our civilization—and our human language—we’ve “colonized” only certain small islands in the space of all possible concepts, leaving vast amounts of interconcept space unexplored.

What’s out there is pretty weird—and sometimes a bit disturbing. Here’s what we see zooming in on the same (randomly chosen) plane around “cat island” as above:

What are all these things? In a sense, words fail us. They’re things on the shores of interconcept space, where human experience has not (yet) taken us, and for which human language has not been developed.

What if we venture further out into interconcept space—and for example just sample points in the space at random? It’s just like we already saw above: we’ll get images that are somehow “statistically typical” of what we humans have put on the web, etc., and on which our AI was trained. Here are a few more examples:

And, yes, we can pick out at least two basic classes of images: ones that seem like “pure abstract textures”, and ones that seem “representational”, and remind us of real-world scenes from human experience. There are intermediate cases—like “textures” with structures that seem like they might “represent something”, and “representational-seeming” images where we just can’t place what they might be representing.

But when we do see recognizable “real-world-inspired” images they’re a curious reflection of the concepts—and general imagery—that we humans find “interesting enough to put on the web”. We’re not dealing here with some kind of “arbitrary interconcept space”; we’re dealing with “human-aligned” interconcept space that’s in a sense anchored to human concepts, but extends between and around them. And, yes, viewed in these terms it becomes quite unsurprising that in the interconcept space we’re sampling, there are so many images that remind us of human forms and common human situations.

But just what were the images that the AI saw, from which it formed this model of interconcept space? There were a few billion of them, “foraged” from the web. Like things on the web in general, it’s a motley collection; here’s a random sample:

Some can be thought of as capturing aspects of “life as it is”, but many are more aspirational, coming from staged and often promotionally oriented photography. And, yes, there are lots of Net-a-Porter-style “clothing-without-heads” images. There are also lots of images of “things”—like food, etc. But somehow when we sample randomly in interconcept space it’s the human forms that most distinctively stand out, conceivably because “things” are not particularly consistent in their structure, but human forms always have a certain consistency of “head-body-arms, etc.” structure.

It’s notable, though, that even the most real-world-like images we find by randomly sampling interconcept space seem to typically be “painterly” and “artistic” rather than “photorealistic” and “photographic”. It’s a different story close to “concept points”—like on cat island. There more photographic forms are common, though as we go away from the “actual concept point”, there’s a tendency towards either a rather toy-like appearance, or something more like an illustration.

By the way, even the most “photographic” images the AI generates won’t be anything that comes directly from the training set. Because—as we’ll discuss later—the AI is not set up to directly store images; instead its training process in effect “grinds up” images to extract their “statistical properties”. And while “statistical features” of the original images will show up in what the AI generates, any detailed arrangement of pixels in them is overwhelmingly unlikely to do so.

But, OK, what happens if we start not at a “describable concept” (like “a cat in a party hat”), but just at a random point in interconcept space? Here are the kinds of things we see:

The images often seem to be a bit more diverse than those around “known concept points” (like our “cat point” above). And occasionally there’ll be a “flash” of something “representationally familiar” (perhaps like a human form) that’ll show up. But most of the time we won’t be able to say “what these images are of”. They’re of things that are somehow “statistically” like what we’ve seen, but they’re not things that are familiar enough that we’ve—at least so far—developed a way to describe them, say with words.

The Images of Interconcept Space

There’s something strangely familiar—yet unfamiliar—to many of the images in interconcept space. It’s fairly common to see pictures that seem like they’re of people:

But they’re “not quite right”. And for us as humans, being particularly attuned to faces, it’s the faces that tend to seem the most wrong—even though other parts are “wrong” as well.

And perhaps in commentary on our nature as a social species (or maybe it’s as a social media species), there’s a great tendency to see pairs or larger groups of people:

There’s also a strange preponderance of torso-only pictures—presumably the result of “fashion shots” in the training data (and, yes, with some rather wild “fashion statements”):

People are by far the most common identifiable elements. But one does sometimes see other things too:

Then there are some landscape-type scenes:

Some look fairly photographically literal, but others build up the impression of landscapes from more abstract elements:

Occasionally there are cityscape-like pictures:

And—still more rarely—indoor-like scenes:

Then there are pictures that look like they’re “exteriors” of some kind:

It’s common to see images built up from lines or dots or otherwise “impressionistically formed”:

And then there are lots of images of that seem like they’re trying to be “of something”, but it’s not at all clear what that “thing” is, and whether indeed it’s something we humans would recognize, or whether instead it’s something somehow “fundamentally alien”:

It’s also quite common to see what look more like “pure patterns”—that don’t really seem like they’re “trying to be things”, but more come across like “decorative textures”:

But probably the single most common type of images are somewhat uniform textures, formed by repeating various simple elements, though usually with “dislocations” of various kinds:

Across interconcept space there’s tremendous variety to the images we see. Many have a certain artistic quality to them—and a feeling that they are some kind of “mindful interpretation” of a perhaps mundane thing in the world, or a simple, essentially mathematical pattern. And to some extent the “mind” involved is a collective version of our human one, reflected in a neural net that has “experienced” some of the many images humans have put on the web, etc. But in some ways the mind is also a more alien one, formed from the computational structure of the neural net, with its particular features, and no doubt in some ways computationally irreducible behavior.

And indeed there are some motifs that show up repeatedly that are presumably reflections of features of the underlying structure of the neural net. The “granulated” appearance, with alternation between light and dark, for example, is presumably a consequence of the dynamics of the convolutional parts of the neural net—and analogous to the results of what amounts to iterated blurring and sharpening with a certain effective pixel scale (reminiscent, for example, of video feedback):

Making Minds Alien

We can think of what we’ve done so far as exploring what a mind trained from human-like experiences can “imagine” by generalizing from those experiences. But what might a different kind of mind imagine?

As a very rough approximation, we can think of just taking the trained “mind” we’ve created, and explicitly modifying it, then seeing what it now “imagines”. Or, more specifically, we can take the neural net we have been using, and start making changes to it, and seeing what effect that has on the images it produces.

We’ll discuss later the details of how the network is set up, but suffice it to say here that it involves 391 distinct internal modules, involving altogether nearly a billion numerical weights. When the network is trained, those numerical weights are carefully tuned to achieve the results we want. But what if we just change them? We’ll still (normally) get a network that can generate images. But in some sense it’ll be “thinking differently”—so potentially the images will be different.

So as a very coarse first experiment—reminiscent of many that are done in biology—let’s just “knock out” each successive module in turn, setting all its weights to zero. If we ask the resulting network to generate a picture of “a cat in a party hat”, here’s what we now get:

Let’s look at these results in a bit more detail. In quite a few cases, zeroing out a single module doesn’t make much of a difference; for example, it might basically only change the facial expression of the cat:

But it can also more fundamentally change the cat (and its hat):

It can change the configuration or position of the cat (and, yes, some of those paws are not anatomically correct):

Zeroing out other modules can in effect change the “rendering” of the cat:

But in other cases things can get much more mixed up, and difficult for us to parse:

Sometimes there’s clearly a cat there, but its presentation is at best odd:

And sometimes we get images that have definite structure, but don’t seem to have anything to do with cats:

Then there are cases where we basically just get “noise”, albeit with things superimposed:

But—much like in neurophysiology—there are some modules (like the very first and last ones in our original list) where zeroing them out basically makes the system not work at all, and just generate “pure random noise”.

As we’ll discuss below, the whole neural net that we’re using has a fairly complex internal structure—for example, with a few fundamentally different kinds of modules. But here’s a sample of what happens if one zeros out modules at different places in the network—and what we see is that for the most part there’s no obvious correlation between where the module is, and what effect zeroing it out will have:

So far, we’ve just looked at what happens if we zero out a single module at a time. Here are some randomly chosen examples of what happens if one zeros out successively more modules (one might call this a “HAL experiment” in remembrance of the fate of the fictional HAL AI in the movie 2001):

And basically once the “catness” of the images is lost, things become more and more alien from there on out, ending either in apparent randomness, or sometimes barren “zeroness”.

Rather than zeroing out modules, we can instead randomize the weights in them (perhaps a bit like the effect of a tumor rather than a stroke in a brain)—but the results are usually at least qualitatively similar:

Something else we can do is just to progressively mix randomness uniformly into every weight in the network (perhaps a bit like globally “drugging” a brain). Here are three examples where in each case 0%, 1%, 2%, … of randomness was added—all “fading away” in a very similar way:

And similarly, we can progressively scale down towards zero (in 1% increments: 100%, 99%, 98%, …) all the weights in the network:

Or we can progressively increase the numerical values of the weights—eventually in some sense “blowing the mind” of the network (and going a bit “psychedelic” in the process):

Minds in Rulial Space

We can think of what we’ve done so far as exploring some of the “natural history” of what’s out there in generative AI space—or as providing a small taste of at least one approximation to the kind of mental imagery one might encounter in alien minds. But how does this fit into a more general picture of alien minds and what they might be like?

With the concept of the ruliad we finally have a principled way to talk about alien minds—at least at a theoretical level. And the key point is that any alien mind—or, for that matter, any mind—can be thought of as “observing” or sampling the ruliad from its own particular point of view, or in effect, its own position in rulial space.

The ruliad is defined to be the entangled limit of all possible computations: a unique object with an inevitable structure. And the idea is that anything—whether one interprets it as a phenomenon or an observer—must be part of the ruliad. The key to our Physics Project is then that “observers like us” have certain general characteristics. We are computationally bounded, with “finite minds” and limited sensory input. And we have a certain coherence that comes from our belief in our persistence in time, and our consistent thread of experience. And what we then discover in our Physics Project is the rather remarkable result that from these characteristics and the general properties of the ruliad alone it’s essentially inevitable that we must perceive the universe to exhibit the fundamental physical laws it does, in particular the three big theories of twentieth-century physics: general relativity, quantum mechanics and statistical mechanics.

But what about more detailed aspects of what we perceive? Well, that will depend on more detailed aspects of us as observers, and of how our minds are set up. And in a sense, each different possible mind can be thought of as existing in a certain place in rulial space. Different human minds are mostly close in rulial space, animal minds further away, and more alien minds still further. But how can we characterize what these minds are “thinking about”, or how these minds “perceive things”?

From inside our own minds we can form a sense of what we perceive. But we don’t really have good ways to reliably probe what another mind perceives. But what about what another mind imagines? Well, that’s where what we’ve been doing here comes in. Because with generative AI we’ve got a mechanism for exposing the “mental imagery” of an “AI mind”.

We could consider doing this with words and text, say with an LLM. But for us humans images have a certain fluidity that text does not. Our eyes and brains can perfectly well “see” and absorb images even if we don’t “understand” them. But it’s very difficult for us to absorb text that we don’t “understand”; it usually tends to seem just like a kind of “word soup”.

But, OK, so we generate “mental imagery” from “minds” that have been “made alien” by various modifications. How come we humans can understand anything such minds make? Well, it’s bit like one person being able to understand the thoughts of another. Their brains—and minds—are built differently. And their “internal view” of things will inevitably be different. But the crucial idea—that’s for example central to language—is that it’s possible to “package up” thoughts into something that can be “transported” to another mind. Whatever some particular internal thought might be, by the time we can express it with words in a language, it’s possible to communicate it to another mind that will “unpack” it into different internal thoughts.

It’s a nontrivial fact of physics that “pure motion” in physical space is possible; in other words, that an “object” can be moved “without change” from one place in physical space to another. And now, in a sense, we’re asking about pure motion in rulial space: can we move something “without change” from one mind at one place in rulial space to another mind at another place? In physical space, things like particles—as well as things like black holes—are the fundamental elements that are imagined to move without change. So what’s now the analog in rulial space? It seems to be concepts—as often, for example, represented by words.

So what does that mean for our exploration of generative AI “alien minds”? We can ask whether when we move from one potentially alien mind to another concepts are preserved. We don’t have a perfect proxy for this (though we could make a better one by appropriately training neural net classifiers). But as a first approximation this is like asking whether as we “change the mind”—or move in rulial space—we can still recognize the “concept” the mind produces. Or, in other words, if we start with a “mind” that’s generating a cat in a party hat, will we still recognize the concepts of cat or hat in what a “modified mind” produces?

And what we’ve seen is that sometimes we do, and sometimes we don’t. And for example when we looked at “cat island” we saw a certain boundary beyond which we could no longer recognize “catness” in the image that was produced. And by studying things like cat island (and particularly its analogs when not just the “prompt” but also the underlying neural net is changed) it should be possible to map out how far concepts “extend” across alien minds.

It’s also possible to think about a kind of inverse question: just what is the extent of a mind in rulial space? Or, in other words, what range of points of view, ultimately about the ruliad, can it hold? Will it be “narrow-minded”, able to think only in particular ways, with particular concepts? Or will it be more “broad-minded”, encompassing more ways of thinking, with more concepts?

In a sense the whole arc of the intellectual development of our civilization can be thought of as corresponding to an expansion in rulial space: with us progressively being able to think in new ways, and about new things. And as we expand in rulial space, we are in effect encompassing more of what we previously would have had to consider the domain of an alien mind.

When we look at images produced by generative AI away from the specifics of human experience—say in interconcept space, or with modified rules of generation—we may at first be able to make little from them. Like inkblots or arrangements of stars we’ll often find ourselves wanting to say that what we see looks like this or that thing we know.

But the real question is whether we can devise some way of describing what we see that allows us to build thoughts on what we see, or “reason” about it. And what’s very typical is that we manage to do this when we come up with a general “symbolic description” of what we see, say captured with words in natural language (or, now, computational language). Before we have those words, or that symbolic description, we’ll tend just not to absorb what we see.

And so, for example, even though nested patterns have always existed in nature, and were even explicitly created by mosaic artisans in the early 1200s, they seem to have never been systematically noticed or discussed at all until the latter part of the 20th century, when finally the framework of “fractals” was developed for talking about them.

And so it may be with many of the forms we’ve seen here. As of today, we have no name for them, no systematic framework for thinking about them, and no reason to view them as important. But particularly if the things we do repeatedly show us such forms, we’ll eventually come up with names for them, and start incorporating them into the domain that our minds cover.

And in a sense what we’ve done here can be thought of as showing us a preview of what’s out there in rulial space, in what’s currently the domain of alien minds. In the general exploration of ruliology, and the investigation of what arbitrary simple programs in the computational universe do, we’re able to jump far across the ruliad. But it’s typical that what we see is not something we can connect to things we’re familiar with. In what we’re doing here, we’re moving only much smaller distances in rulial space. We’re starting from generative AI that’s closely aligned with current human development—having been trained from images that we humans have put on the web, etc. But then we’re making small changes to our “AI mind”, and looking at what it now generates.

What we see is often surprising. But it’s still close enough to where we “currently are” in rulial space that we can—at least to some extent—absorb and reason about what we’re seeing. Still, the images often don’t “make sense” to us. And, yes, quite possibly the AI has invented something that has a rich and “meaningful” inner structure. But it’s just that we don’t (yet) have a way to talk about it—and if we did, it would immediately “make perfect sense” to us.

So if we see something we don’t understand, can we just “train a translator”? At some level the answer must be yes. Because the Principle of Computational Equivalence implies that ultimately there’s a fundamental uniformity to the ruliad. But the problem is that the translator is likely to have to do an irreducible amount of computational work. And so it won’t be implementable by a “mind like ours”. Still, even though we can’t create a “general translator” we can expect that certain features of what we see will still be translatable—in effect by exploiting certain pockets of computational reducibility that must necessarily exist even when the system as a whole is full of computational irreducibility. And operationally what this means in our case is that the AI may in effect have found certain regularities or patterns that we don’t happen to have noticed but that are useful in exploring further from the “current human point” in rulial space.

It’s very challenging to get an intuitive understanding of what rulial space is like. But the approach we’ve taken here is for me a promising first effort in “humanizing” rulial space, and seeing just how we might be able to relate to what is so far the domain of alien minds.

Appendix: How Does the Generative AI Work?

In the main part of this piece, we’ve mostly just talked about what generative AI does, not how it works inside. Here I’ll go a little deeper into what’s inside the particular type of generative AI system that I’ve used in my explorations. It’s a method called stable diffusion, and its operation is in many ways both clever and surprising. As it’s implemented today it’s steeped in fairly complicated engineering details. To what extent these will ultimately be necessary isn’t clear. But in any case here I’ll mostly concentrate on general principles, and on giving a broad outline of how generative AI can be used to produce images.

The Distribution of Typical Images

At the core of generative AI is the ability to produce things of some particular type that “follow the patterns of” known things of that type. So, for example, large language models (LLMs) are intended to produce text that “follows the patterns” of text written by humans, say on the web. And generative AI systems for images are similarly intended to produce images that “follow the patterns” of images put on the web, etc.

But what kinds of patterns exist in typical images, say on the web? Here are some examples of “typical images”—scaled down to 32×32 pixels and taken from a standard set of 60,000 images:

And as a very first thing, we can ask what colors show up in these images. They’re not uniform in RGB space:

But what about the positions of different colors? Adjusting to accentuate color differences, the “average image” turns out to have a curious “HAL’s eye” look (presumably with blue for sky at the top, and brown for earth at the bottom):

But just picking pixels separately—even with the color distribution inferred from actual images—won’t produce images that in any way look “natural” or “realistic”:

And the immediate issue is that the pixels aren’t really independent; most pixels in most images are correlated in color with nearby pixels. And in a first approximation one can capture this for example by fitting the list of colors of all the pixels to a multivariate Gaussian distribution with a covariance matrix that represents their correlation. Sampling from this distribution gives images like these—that indeed look somehow “statistically natural”, even if there isn’t appropriate detailed structure in them:

So, OK, how can one do better? The basic idea is to use neural nets, which can in effect encode detailed long-range connections between pixels. In some way it’s similar to what’s done in LLMs like ChatGPT—where one has to deal with long-range connections between words in text. But for images it’s structurally a bit more difficult, because in some sense one has to “consistently fit together 2D patches” rather than just progressively extend a 1D sequence.

And the typical way this is done at first seems a bit bizarre. The basic idea is to start with a random array of pixels—corresponding in effect to “pure noise”—and then progressively to “reduce the noise” to end up with a “reasonable image” that follows the patterns of typical images, all the while guided by some prompt that says what one wants the “reasonable image” to be of.

Attractors and Inverse Diffusion

How does one go from randomness to definite “reasonable” things? The key is to use the notion of attractors. In a very simple case, one might have a system—like this “mechanical” example—where from any “randomly chosen” initial condition one also evolves to one of (here) two definite (fixed-point) attractors:

One has something similar in a neural net that’s for example trained to recognize digits:

Regardless of exactly how each digit is written, or noise that gets added to it, the network will take this input and evolve to an attractor corresponding to a digit.

Sometimes there can be lots of attractors. Like in this (“class 2”) cellular automaton evolving down the page, many different initial conditions can lead to the same attractor, but there are many possible attractors, corresponding to different final patterns of stripes:

The same can be true for example in 2D cellular automata, where now the attractors can be thought of as being different “images” with structure determined by the cellular automaton rule:

But what if one wants to arrange to have particular images as attractors? Here’s where the somewhat surprising idea of “stable diffusion” can be used. Imagine we start with two possible images, and , and then in a series of steps progressively add noise to them:

Here’s the bizarre thing we now want to do: train a neural net to take the image we get at a particular step, and “go backwards”, removing noise from it. The neural net we’ll use for this is somewhat complicated, with “convolutional” pieces that basically operate on blocks of nearby pixels, and “transformers” that get applied with certain weights to more distant pixels. Schematically in Wolfram Language the network looks at a high level like this:

And roughly what it’s doing is to make an informationally compressed version of each image, and then to expand it again (through what is usually called a “U-net” neural net). We start with an untrained version of this network (say just randomly initialized). Then we feed it a couple of million examples of noisy pictures of and , and the denoised outputs we want in each case.

Then if we take the trained neural net and successively apply it, for example, to a “noised ”, the net will “correctly” determine that the “denoised” version is a “pure ”:

But what if we apply this network to pure noise? The network has been set up to always eventually evolve either to the “” attractor or the “” attractor. But which it “chooses” in a particular case will depend on the details of the initial noise—so in effect the network will seem to be picking at random to “fish” either “” or “” out of the noise:

How does this apply to our original goal of generating images “like” those found for example on the web? Well, instead of just training our “denoising” (or “inverse diffusion”) network on a couple of “target” images, let’s imagine we train it on billions of images from the web. And let’s also assume that our network isn’t big enough to store all those images in any kind of explicit way.

In the abstract it’s not clear what the network will do. But the remarkable empirical fact is that it seems to manage to successfully generate (“from noise”) images that “follow the general patterns” of the images it was trained from. There isn’t any clear way to “formally validate” this success. It’s really just a matter of human perception: to us the images (generally) “look right”.

It could be that with a different (alien?) system of perception we’d immediately see “something wrong” with the images. But for purposes of human perception, the neural net seems to give “reasonable-looking” images—perhaps not least because the neural net operates at least approximately like our brains and our processes of perception seem to operate.

Injecting a Prompt

We’ve described how a denoising neural net seems to be able to start from some configuration of random noise and generate a “reasonable-looking” image. And from any particular configuration of noise, a given neural net will always generate the same image. But there’s no way to tell what that image will be of; it’s just something to empirically explore, as we did above.

But what if we want to “guide” the neural net to generate an image that we’d describe as being of a definite thing, like “a cat in a party hat”? We could imagine “continually checking” whether the image we’re generating would be recognized by a neural net as being of what we wanted. And conceptually that’s what we can do. But we also need a way to “redirect” the image generation if it’s “not going in the right direction”. And a convenient way to do this is to mix a “description of what we want” right into the denoising training process. In particular, if we’re training to “recover an ”, mix a description of the “” right alongside the image of the “”.

And here we can make use of a key feature of neural nets: that ultimately they operate on arrays of (real) numbers. So whether they’re dealing with images composed of pixels, or text composed of words, all these things eventually have to be “ground up” into arrays of real numbers. And when a neural net is trained, what it’s ultimately “learning” is just how to appropriately transform these “disembodied” arrays of numbers.

There’s a fairly natural way to generate an array of numbers from an image: just take the triples of red, green and blue intensity values for each pixel. (Yes, we could pick a different detailed representation, but it’s not likely to matter—because the neural net can always effectively “learn a conversion”.) But what about a textual description, like “a cat in a party hat”?

We need to find a way to encode text as an array of numbers. And actually LLMs face the same issue, and we can solve it in basically the same way here as LLMs do. In the end what we want is to derive from any piece of text a “feature vector” consisting of an array of numbers that provide some kind of representation of the “effective meaning” of the text, or at least the “effective meaning” relevant to describing images.

Let’s say we train a neural net to reproduce associations between images and captions, as found for example on the web. If we feed this neural net an image, it’ll try to generate a caption for the image. If we feed the neural net a caption, it’s not realistic for it to generate a whole image. But we can look at the innards of the neural net and see the array of numbers it derived from the caption—and then use this as our feature vector. And the idea is that because captions that “mean the same thing” should be associated in the training set with “the same kind of images”, they should have similar feature vectors.

So now let’s say we want to generate a picture of a cat in a party hat. First we find the feature vector associated with the text “a cat in a party hat”. Then this is what we keep mixing in at each stage of denoising to guide the denoising process, and end up with an image that the image captioning network will identify as “a cat in a party hat”.

The Latent Space “Trick”

The most direct way to do “denoising” is to operate directly on the pixels in an image. But it turns out there’s a considerably more efficient approach, which operates not on pixels but on “features” of the image—or, more specifically, on a feature vector which describes an image.

In a “raw image” presented in terms of pixels, there’s a lot of redundancy—which is why, for example, image formats like JPEG and PNG manage to compress raw images so much without even noticeably modifying them for purposes of typical human perception. But with neural nets it’s possible to do much greater compression, particularly if all we want to do is to preserve the “meaning” of an image, without worrying about its precise details.

And in fact as part of training a neural net to associate images with captions, we can derive a “latent representation” of images, or in effect a feature vector that captures the “important features” of the image. And then we can do everything we’ve discussed so far directly on this latent representation—decoding it only at the end into the actual pixel representation of the image.

So what does it look like to build up the latent representation of an image? With the particular setup we’re using here, it turns out that the feature vector in the latent representation still preserves the basic spatial arrangement of the image. The “latent pixels” are much coarser than the “visible” ones, and happen to be characterized by 4 numbers rather than the 3 for RGB. But we can decode things to see the “denoising” process happening in terms of “latent pixels”:

And then we can take the latent representation we get, and once again use a trained neural net to fill in a “decoding” of this in terms of actual pixels, getting out our final generated image.

An Analogy in Simple Programs

Generative AI systems work by having attractors that are carefully constructed through training so that they correspond to “reasonable outputs”. A large part of what we’ve done above is to study what happens to these attractors when we change the internal parameters of the system (neural net weights, etc.). What we’ve seen has been complicated, and, indeed, often quite “alien looking”. But is there perhaps a simpler setup in which we can see similar core phenomena?

By the time we’re thinking about creating attractors for realistic images, etc. it’s inevitable that things are going to be complicated. But what if we look at systems with much simpler setups? For example, consider a dynamical system whose state is characterized just by a single number—such as an iterated map on the interval, like x a x (1 – x).

Starting from a uniform array of possible x values, we can show down the page which values of x are achieved at successive iterations:

For a = 2.9, the system evolves from any initial value to a single attractor, which consists of a single fixed final value. But if we change the “internal parameter” a to 3.1, we now get two distinct final values. And at the “bifurcation point” a = 3 there’s a sudden change from one to two distinct final values. And indeed in our generative AI system it’s fairly common to see similar discontinuous changes in behavior even when an internal parameter is continuously changed.

As another example—slightly closer to image generation—consider (as above) a 1D cellular automaton that exhibits class 2 behavior, and evolves from any initial state to some fixed final state that one can think of as an attractor for the system:

Which attractor one reaches depends on the initial condition one starts from. But—in analogy to our generative AI system—we can think of all the attractors as being “reasonable outputs” for the system. But now what happens if we change the parameters of the system, or in this case, the cellular automaton rule? In particular, what will happen to the attractors? It’s like what we did above in changing weights in a neural net—but a lot simpler.

The particular rule we’re using here has 4 possible colors for each cell, and is defined by just 64 discrete values from 0 to 3. So let’s say we randomly change just one of those values at a time. Here are some examples of what we get, always starting from the same initial condition as in the first picture above:

With a couple of exceptions these seem to produce results that are at least “roughly similar” to what we got without changing the rule. In analogy to what we did above, the cat might have changed, but it’s still more or less a cat. But let’s now try “progressive randomization”, where we modify successively more values in the definition of the rule. For a while we again get “roughly similar” results, but then—much like in our cat examples above—things eventually “fall apart” and we get “much more random” results:

One important difference between “stable diffusion” and cellular automata is that while in cellular automata, the evolution can lead to continued change forever, in stable diffusion there’s an annealing process used that always makes successive steps “progressively smaller”—and essentially forces a fixed point to be reached.

But notwithstanding this, we can try to get a closer analogy to image generation by looking (again as above) at 2D cellular automata. Here’s an example of the (not-too-exciting-as-images) “final states” reached from three different initial states in a particular rule:

And here’s what happens if one progressively changes the rule:

At first one still gets “reasonable-according-to-the-original-rule” final states. But if one changes the rule further, things get “more alien”, until they look to us quite random.

In changing the rule, one is in effect “moving in rulial space”. And by looking at how this works in cellular automata, one can get a certain amount of intuition. (Changes to the rule in a cellular automaton seem a bit like “changes to the genotype” in biology—with the behavior of the cellular automaton representing the corresponding “phenotype”.) But seeing how “rulial motion” works in a generative AI that’s been trained on “human-style input” gives a more accessible and humanized picture of what’s going on, even if it seems still further out of reach in terms of any kind of traditional explicit formalization.

Thanks

This project is the first I’ve been able to do with our new Wolfram Institute. I thank our Fourmilab Fellow Nik Murzin and Ruliad Fellow Richard Assar for help. I also thank Jeff Arle, Nicolò Monti, Philip Rosedale and the Wolfram Research Machine Learning Group.

LLM Tech and a Lot More: Version 13.3 of Wolfram Language and Mathematica

Stephen Wolfram — Wed, 28 Jun 2023 18:02:59 +0000

The Leading Edge of 2023 Technology … and Beyond

Today we’re launching Version 13.3 of Wolfram Language and Mathematica—both available immediately on desktop and cloud. It’s only been 196 days since we released Version 13.2, but there’s a lot that’s new, not least a whole subsystem around LLMs.

Last Friday (June 23) we celebrated 35 years since Version 1.0 of Mathematica (and what’s now Wolfram Language). And to me it’s incredible how far we’ve come in these 35 years—yet how consistent we’ve been in our mission and goals, and how well we’ve been able to just keep building on the foundations we created all those years ago.

And when it comes to what’s now Wolfram Language, there’s a wonderful timelessness to it. We’ve worked very hard to make its design as clean and coherent as possible—and to make it a timeless way to elegantly represent computation and everything that can be described through it.

Last Friday I fired up Version 1 on an old Mac SE/30 computer (with 2.5 megabytes of memory), and it was a thrill see functions like Plot and NestList work just as they would today—albeit a lot slower. And it was wonderful to be able to take (on a floppy disk) the notebook I created with Version 1 and have it immediately come to life on a modern computer.

But even as we’ve maintained compatibility over all these years, the scope of our system has grown out of all recognition—with everything in Version 1 now occupying but a small sliver of the whole range of functionality of the modern Wolfram Language:

So much about Mathematica was ahead of its time in 1988, and perhaps even more about Mathematica and the Wolfram Language is ahead of its time today, 35 years later. From the whole idea of symbolic programming, to the concept of notebooks, the universal applicability of symbolic expressions, the notion of computational knowledge, and concepts like instant APIs and so much more, we’ve been energetically continuing to push the frontier over all these years.

Our long-term objective has been to build a full-scale computational language that can represent everything computationally, in a way that’s effective for both computers and humans. And now—in 2023—there’s a new significance to this. Because with the advent of LLMs our language has become a unique bridge between humans, AIs and computation.

The attributes that make Wolfram Language easy for humans to write, yet rich in expressive power, also make it ideal for LLMs to write. And—unlike traditional programming languages— Wolfram Language is intended not only for humans to write, but also to read and think in. So it becomes the medium through which humans can confirm or correct what LLMs do, to deliver computational language code that can be confidently assembled into a larger system.

The Wolfram Language wasn’t originally designed with the recent success of LLMs in mind. But I think it’s a tribute to the strength of its design that it now fits so well with LLMs—with so much synergy. The Wolfram Language is important to LLMs—in providing a way to access computation and computational knowledge from within the LLM. But LLMs are also important to Wolfram Language—in providing a rich linguistic interface to the language.

We’ve always built—and deployed—Wolfram Language so it can be accessible to as many people as possible. But the advent of LLMs—and our new Chat Notebooks—opens up Wolfram Language to vastly more people. Wolfram|Alpha lets anyone use natural language—without prior knowledge—to get questions answered. Now with LLMs it’s possible to use natural language to start defining potential elaborate computations.

As soon as you’ve formulated your thoughts in computational terms, you can immediately “explain them to an LLM”, and have it produce precise Wolfram Language code. Often when you look at that code you’ll realize you didn’t explain yourself quite right, and either the LLM or you can tighten up your code. But anyone—without any prior knowledge—can now get started producing serious Wolfram Language code. And that’s very important in seeing Wolfram Language realize its potential to drive “computational X” for the widest possible range of fields X.

But while LLMs are “the biggest single story” in Version 13.3, there’s a lot else in Version 13.3 too—delivering the latest from our long-term research and development pipeline. So, yes, in Version 13.3 there’s new functionality not only in LLMs but also in many “classic” areas—as well as in new areas having nothing to do with LLMs.

Across the 35 years since Version 1 we’ve been able to continue accelerating our research and development process, year by year building on the functionality and automation we’ve created. And we’ve also continually honed our actual process of research and development—for the past 5 years sharing our design meetings on open livestreams.

Version 13.3 is—from its name—an “incremental release”. But—particularly with its new LLM functionality—it continues our tradition of delivering a long list of important advances and updates, even in incremental releases.

LLM Tech Comes to Wolfram Language

LLMs make possible many important new things in the Wolfram Language. And since I’ve been discussing these in a series of recent posts, I’ll just give only a fairly short summary here. More details are in the other posts, both ones that have appeared, and ones that will appear soon.

To ensure you have the latest Chat Notebook functionality installed and available, use:

PacletInstall["Wolfram/Chatbook" "1.0.0", UpdatePacletSites True].

The most immediately visible LLM tech in Version 13.3 is Chat Notebooks. Go to File > New > Chat-Enabled Notebook and you’ll get a Chat Notebook that supports “chat cells” that let you “talk to” an LLM. Press ' (quote) to get a new chat cell:

You might not like some details of what got done (do you really want those boldface labels?) but I consider this pretty impressive. And it’s a great example of using an LLM as a “linguistic interface” with common sense, that can generate precise computational language, which can then be run to get a result.

This is all very new technology, so we don’t yet know what patterns of usage will work best. But I think it’s going to go like this. First, you have to think computationally about whatever you’re trying to do. Then you tell it to the LLM, and it’ll produce Wolfram Language code that represents what it thinks you want to do. You might just run that code (or the Chat Notebook will do it for you), and see if it produces what you want. Or you might read the code, and see if it’s what you want. But either way, you’ll be using computational language—Wolfram Language—as the medium to formalize and express what you’re trying to do.

When you’re doing something you’re familiar with, it’ll almost always be faster and better to think directly in Wolfram Language, and just enter the computational language code you want. But if you’re exploring something new, or just getting started on something, the LLM is likely to be a really valuable way to “get you to first code”, and to start the process of crispening up what you want in computational terms.

If the LLM doesn’t do exactly what you want, then you can tell it what it did wrong, and it’ll try to correct it—though sometimes you can end up doing a lot of explaining and having quite a long dialog (and, yes, it’s often vastly easier just to type Wolfram Language code yourself):

Sometimes the LLM will notice for itself that something went wrong, and try changing its code, and rerunning it:

And even if it didn’t write a piece of code itself, it’s pretty good at piping up to explain what’s going on when an error is generated:

And actually it’s got a big advantage here, because “under the hood” it can look at lots of details (like stack trace, error documentation, etc.) that humans usually don’t bother with.

To support all this interaction with LLMs, there’s all kinds of new structure in the Wolfram Language. In Chat Notebooks there are chat cells, and there are chatblocks (indicated by gray bars, and generating with ~) that delimit the range of chat cells that will be fed to the LLM when you press shiftenter on a new chat cell. And, by the way, the whole mechanism of cells, cell groups, etc. that we invented 36 years ago now turns out to be extremely powerful as a foundation for Chat Notebooks.

One can think of the LLM as a kind of “alternate evaluator” in the notebook. And there are various ways to set up and control it. The most immediate is in the menu associated with every chat cell and every chatblock (and also available in the notebook toolbar):

The first items here let you define the “persona” for the LLM. Is it going to act as a Code Assistant that writes code and comments on it? Or is it just going to be a Code Writer, that writes code without being wordy about it? Then there are some “fun” personas—like Wolfie and Birdnardo—that respond “with an attitude”. The Advanced Settings let you do things like set the underlying LLM model you want to use—and also what tools (like Wolfram Language code evaluation) you want to connect to it.

Ultimately personas are mostly just special prompts for the LLM (together, sometimes with tools, etc.) And one of the new things we’ve recently launched to support LLMs is the Wolfram Prompt Repository:

The Prompt Repository contains several kinds of prompts. The first are personas, which are used to “style” and otherwise inform chat interactions. But then there are two other types of prompts: function prompts, and modifier prompts.

Function prompts are for getting the LLM to do something specific, like summarize a piece of text, or suggest a joke (it’s not terribly good at that). Modifier prompts are for determining how the LLM should modify its output, for example translating into a different human language, or keeping it to a certain length.

You can pull in function prompts from the repository into a Chat Notebook by using !, and modifier prompts using #. There’s also a ^ notation for saying that you want the “input” to the function prompt to be the cell above:

This is how you can access LLM functionality from within a Chat Notebook. But there’s also a whole symbolic programmatic way to access LLMs that we’ve added to the Wolfram Language. Central to this is LLMFunction, which acts very much like a Wolfram Language pure function, except that it gets “evaluated” not by the Wolfram Language kernel, but by an LLM:

You can access a function prompt from the Prompt Repository using LLMResourceFunction:

There’s also a symbolic representation for chats. Here’s an empty chat:

And here now we “say something”, and the LLM responds:

There’s lots of depth to both Chat Notebooks and LLM functions—as I’ve described elsewhere. There’s LLMExampleFunction for getting an LLM to follow examples you give. There’s LLMTool for giving an LLM a way to call functions in the Wolfram Language as “tools”. And there’s LLMSynthesize which provides raw access to the LLM as its text completion and other capabilities. (And controlling all of this is $LLMEvaluator which defines the default LLM configuration to use, as specified by an LLMConfiguration object.)

I consider it rather impressive that we’ve been able to get to the level of support for LLMs that we have in Version 13.3 in less than six months (along with building things like the Wolfram Plugin for ChatGPT, and the Wolfram ChatGPT Plugin Kit). But there’s going to be more to come, with LLM functionality increasingly integrated into Wolfram Language and Notebooks, and, yes, Wolfram Language functionality increasingly integrated as a tool into LLMs.

Line, Surface and Contour Integration

“Find the integral of the function ___” is a typical core thing one wants to do in calculus. And in Mathematica and the Wolfram Language that’s achieved with Integrate. But particularly in applications of calculus, it’s common to want to ask slightly more elaborate questions, like “What’s the integral of ___ over the region ___?”, or “What’s the integral of ___ along the line ___?”

Almost a decade ago (in Version 10) we introduced a way to specify integration over regions—just by giving the region “geometrically” as the domain of the integral:

It had always been possible to write out such an integral in “standard Integrate” form

but the region specification is much more convenient—as well as being much more efficient to process.

Finding an integral along a line is also something that can ultimately be done in “standard Integrate” form. And if you have an explicit (parametric) formula for the line this is typically fairly straightforward. But if the line is specified in a geometrical way then there’s real work to do to even set up the problem in “standard Integrate” form. So in Version 13.3 we’re introducing the function LineIntegrate to automate this.

LineIntegrate can deal with integrating both scalar and vector functions over lines. Here’s an example where the line is just a straight line:

But LineIntegrate also works for lines that aren’t straight, like this parametrically specified one:

To compute the integral also requires finding the tangent vector at every point on the curve—but LineIntegrate automatically does that:

Line integrals are common in applications of calculus to physics. But perhaps even more common are surface integrals, representing for example total flux through a surface. And in Version 13.3 we’re introducing SurfaceIntegrate. Here’s a fairly straightforward integral of flux that goes radially outward through a sphere:

Here’s a more complicated case:

And here’s what the actual vector field looks like on the surface of the dodecahedron:

LineIntegrate and SurfaceIntegrate deal with integrating scalar and vector functions in Euclidean space. But in Version 13.3 we’re also handling another kind of integration: contour integration in the complex plane.

We can start with a classic contour integral—illustrating Cauchy’s theorem:

Here’s a slightly more elaborate complex function

and here’s its integral around a circular contour:

Needless to say, this still gives the same result, since the new contour still encloses the same poles:

More impressively, here’s the result for an arbitrary radius of contour:

And here’s a plot of the (imaginary part of the) result:

Contours can be of any shape:

The result for the contour integral depends on whether the pole is inside the “Pac-Man”:

Another Milestone for Special Functions

One can think of special functions as a way of “modularizing” mathematical results. It’s often a challenge to know that something can be expressed in terms of special functions. But once one’s done this, one can immediately apply the independent knowledge that exists about the special functions.

Even in Version 1.0 we already supported many special functions. And over the years we’ve added support for many more—to the point where we now cover everything that might reasonably be considered a “classical” special function. But in recent years we’ve also been tackling more general special functions. They’re mathematically more complex, but each one we successfully cover makes a new collection of problems accessible to exact solution and reliable numerical and symbolic computation.

Most of the “classic” special functions—like Bessel functions, Legendre functions, elliptic integrals, etc.—are in the end univariate hypergeometric functions. But one important frontier in “general special functions” are those corresponding to bivariate hypergeometric functions. And already in Version 4.0 (1999) we introduced one example of such as a function: AppellF1. And, yes, it’s taken a while, but now in Version 13.3 we’ve finally finished doing the math and creating the algorithms to introduce AppellF2, AppellF3 and AppellF4.

On the face of it, it’s just another function—with lots of arguments—whose value we can find to any precision:

Occasionally it has a closed form:

But despite its mathematical sophistication, plots of it tend to look fairly uninspiring:

Series expansions begin to show a little more:

And ultimately this is a function that solves a pair of PDEs that can be seen as a generalization to two variables of the univariate hypergeometric ODE. So what other generalizations are possible? Paul Appell spent many years around the turn of the twentieth century looking—and came up with just four, which as of Version 13.3 now all appear in the Wolfram Language, as AppellF1, AppellF2, AppellF3 and AppellF4.

To make special functions useful in the Wolfram Language they need to be “knitted” into other capabilities of the language—from numerical evaluation to series expansion, calculus, equation solving, and integral transforms. And in Version 13.3 we’ve passed another special function milestone, around integral transforms.

When I started using special functions in the 1970s the main source of information about them tended to be a small number of handbooks that had been assembled through decades of work. When we began to build Mathematica and what’s now the Wolfram Language, one of our goals was to subsume the information in such handbooks. And over the years that’s exactly what we’ve achieved—for integrals, sums, differential equations, etc. But one of the holdouts has been integral transforms for special functions. And, yes, we’ve covered a great many of these. But there are exotic examples that can often only “coincidentally” be done in closed form—and that in the past have only been found in books of tables.

But now in Version 13.3 we can do cases like:

And in fact we believe that in Version 13.3 we’ve reached the edge of what’s ever been figured out about Laplace transforms for special functions. The most extensive handbook—finally published in 1973—runs to about 400 pages. A few years ago we could do about 55% of the forward Laplace transforms in the book, and 31% of the inverse ones. But now in Version 13.3 we can do 100% of the ones that we can verify as correct (and, yes, there are definitely some mistakes in the book). It’s the end of a long journey, and a satisfying achievement in the quest to make as much mathematical knowledge as possible automatically computable.

Finite Fields!

Ever since Version 1.0 we’ve been able to do things like factoring polynomials modulo primes. And many packages have been developed that handle specific aspects of finite fields. But in Version 13.3 we now have complete, consistent coverage of all finite fields—and operations with them.

Here’s our symbolic representation of the field of integers modulo 5 (AKA ℤ₅ or GF(5)):

And here are symbolic representations of the elements of this field—which in this particular case can be rather trivially identified with ordinary integers mod 5:

Arithmetic immediately works on these symbolic elements:

But where things get a bit trickier is when we’re dealing with prime-power fields. We represent the field GF(2³) symbolically as:

But now the elements of this field no longer have a direct correspondence with ordinary integers. We can still assign “indices” to them, though (with elements 0 and 1 being the additive and multiplicative identities). So here’s an example of an operation in this field:

But what actually is this result? Well, it’s an element of the finite field—with index 4—represented internally in the form:

The little box opens out to show the symbolic FiniteField construct:

And we can extract properties of the element, like its index:

So here, for example, are the complete addition and multiplication tables for this field:

For the field GF(7²) these look a little more complicated:

There are various number-theoretic-like functions that one can compute for elements of finite fields. Here’s an element of GF(5¹⁰):

The multiplicative order of this (i.e. power of it that gives 1) is quite large:

Here’s its minimal polynomial:

But where finite fields really begin to come into their own is when one looks at polynomials over them. Here, for example, is factoring over GF(3²):

Expanding this gives a finite-field-style representation of the original polynomial:

Here’s the result of expanding a power of a polynomial over GF(3²):

More, Stronger Computational Geometry

We originally introduced computational geometry in a serious way into the Wolfram Language a decade ago. And ever since then we’ve been building more and more capabilities in computational geometry.

We’ve had RegionDistance for computing the distance from a point to a region for a decade. In Version 13.3 we’ve now extended RegionDistance so it can also compute the shortest distance between two regions:

We’ve also introduced RegionFarthestDistance which computes the furthest distance between any two points in two given regions:

Another new function in Version 13.3 is RegionHausdorffDistance which computes the largest of all shortest distances between points in two regions; in this case it gives a closed form:

Another pair of new functions in Version 13.3 are InscribedBall and CircumscribedBall—which give (n-dimensional) spheres that, respectively, just fit inside and outside regions you give:

In the past several versions, we’ve added functionality that combines geo computation with computational geometry. Version 13.3 has the beginning of another initiative—introducing abstract spherical geometry:

This works for spheres in any number of dimensions:

In addition to adding functionality, Version 13.3 also brings significant speed enhancements (often 10x or more) to some core operations in 2D computational geometry—making things like computing this fast even though it involves complicated regions:

Visualizations Begin to Come Alive

A great long-term strength of the Wolfram Language has been its ability to produce insightful visualizations in a highly automated way. In Version 13.3 we’re taking this further, by adding automatic “live highlighting”. Here’s a simple example, just using the function Plot. Instead of just producing static curves, Plot now automatically generates a visualization with interactive highlighting:

The same thing works for ListPlot:

The highlighting can, for example, show dates too:

There are many choices for how the highlighting should be done. The simplest thing is just to specify a style in which to highlight whole curves:

But there are many other built-in highlighting specifications. Here, for example, is "XSlice":

In the end, though, highlighting is built up from a whole collection of components—like "NearestPoint", "Crosshairs", "XDropline", etc.—that you can assemble and style for yourself:

The option PlotHighlighting defines global highlighting in a plot. But by using the Highlighted “wrapper” you can specify that only a particular element in the plot should be highlighted:

For interactive and exploratory purposes, the kind of automatic highlighting we’ve just been showing is very convenient. But if you’re making a static presentation, you’ll need to “burn in” particular pieces of highlighting—which you can do with Placed:

In indicating elements in a graphic there are different effects one can use. In Version 13.1 we introduced DropShadowing[]. In Version 13.3 we’re introducing Haloing:

Haloing can also be combined with interactive highlighting:

By the way, there are lots of nice effects you can get with Haloing in graphics. Here’s a geo example—including some parameters for the “orientation” and “thickness” of the haloing:

Publishing to Augmented + Virtual Reality

Throughout the history of the Wolfram Language 3D visualization has been an important capability. And we’re always looking for ways to share and communicate 3D geometry. Already back in the early 1990s we had experimental implementations of VR. But at the time there wasn’t anything like the kind of infrastructure for VR that would be needed to make this broadly useful. In the mid-2010s we then introduced VR functionality based on Unity—that provides powerful capabilities within the Unity ecosystem, but is not accessible outside.

Today, however, it seems there are finally broad standards emerging for AR and VR. And so in Version 13.3 we’re able to begin delivering what we hope will provide widely accessible AR and VR deployment from the Wolfram Language.

At a underlying level what we’re doing is to support the USD and GLTF geometry representation formats. But we’re also building a higher-level interface that allows anyone to “publish” 3D geometry for AR and VR.

Given a piece of geometry (which for now can’t involve too many polygons), all you do is apply ARPublish:

The result is a cloud object that has a certain underlying UUID, but is displayed in a notebook as a QR code. Now all you do is look at this QR code with your phone (or tablet, etc.) camera, and press the URL it extracts.

The result will be that the geometry you published with ARPublish now appears in AR on your phone:

Move your phone and you’ll see that your geometry has been realistically placed into the scene. You can also go to a VR “object” mode in which you can manipulate the geometry on your phone.

“Under the hood” there are some slightly elaborate things going on—particularly in providing the appropriate data to different kinds of phones. But the result is a first step in the process of easily being able to get AR and VR output from the Wolfram Language—deployed in whatever devices support AR and VR.

Getting the Details Right: The Continuing Story

In every version of Wolfram Language we add all sorts of fundamentally new capabilities. But we also work to fill in details of existing capabilities, continually pushing to make them as general, consistent and accurate as possible. In Version 13.3 there are many details that have been “made right”, in many different areas.

Here’s one example: the comparison (and sorting) of Around objects. Here are 10 random “numbers with uncertainty”:

These sort by their central value:

But if we look at these, many of their uncertainty regions overlap:

So when should we consider a particular number-with-uncertainty “greater than” another? In Version 13.3 we carefully take into account uncertainty when making comparisons. So, for example, this gives True:

But when there’s too big an uncertainty in the values, we no longer consider the ordering “certain enough”:

Here’s another example of consistency: the applicability of Duration. We introduced Duration to apply to explicit time constructs, things like Audio objects, etc. But in Version 13.3 it also applies to entities for which there’s a reasonable way to define a “duration”:

Dates (and times) are complicated things—and we’ve put a lot of effort into handling them correctly and consistently in the Wolfram Language. One concept that we introduced a few years ago is date granularity: the (subtle) analog of numerical precision for dates. But at first only some date functions supported granularity; now in Version 13.3 all date functions include a DateGranularity option—so that granularity can consistently be tracked through all date-related operations:

Also in dates, something that’s been added, particularly for astronomy, is the ability to deal with “years” specified by real numbers:

And one consequence of this is that it becomes easier to make a plot of something like astronomical distance as a function of time:

Also in astronomy, we’ve been steadily extending our capabilities to consistently fill in computations for more situations. In Version 13.3, for example, we can now compute sunrise, etc. not just from points on Earth, but from points anywhere in the solar system:

By the way, we’ve also made the computation of sunrise more precise. So now if you ask for the position of the Sun right at sunrise you’ll get a result like this:

How come the altitude of the Sun is not zero at sunrise? That’s because the disk of the Sun is of nonzero size, and “sunrise” is defined to be when any part of the Sun pokes over the horizon.

Even Easier to Type: Affordances for Wolfram Language Input

Back in 1988 when what’s now Wolfram Language first existed, the only way to type it was like ordinary text. But gradually we’ve introduced more and more “affordances” to make it easier and faster to type correct Wolfram Language input. In 1996, with Version 3, we introduced automatic spacing (and spanning) for operators, as well as brackets that flashed when they matched—and things like -> being automatically replaced by . Then in 2007, with Version 6, we introduced—with some trepidation at first—syntax coloring. We’d had a way to request autocompletion of a symbol name all the way back to the beginning, but it’d never been good or efficient enough for us to make it happen all the time as you type. But in 2012, for Version 9, we created a much more elaborate autocomplete system—that was useful and efficient enough that we turned it on for all notebook input. A key feature of this autocomplete system was its context-sensitive knowledge of the Wolfram Language, and how and where different symbols and strings typically appear. Over the past decade, we’ve gradually refined this system to the point where I, for one, deeply rely on it.

In recent versions, we’ve made other “typability” improvements. For example, in Version 12.3, we generalized the -> to transformation to a whole collection of “auto operator renderings”. Then in Version 13.0 we introduced “automatching” of brackets, in which, for example, if you enter [ at the end of what you’re typing, you’ll automatically get a matching ].

Making “typing affordances” work smoothly is a painstaking and tricky business. But in every recent version we’ve steadily been adding more features that—in very “natural” ways—make it easier and faster to type Wolfram Language input.

In Version 13.3 one major change is an enhancement to autocompletion. Instead of just showing pure completions in which characters are appended to what’s already been typed, the autocompletion menu now includes “fuzzy completions” that fill in intermediate characters, change capitalization, etc.

So, for example, if you type “lp” you now get ListPlot as a completion (the little underlines indicate where the letters you actually type appear):

From a design point of view one thing that’s important about this is that it further removes the “short name” premium—and weights things even further on the side of wanting names that explain themselves when they’re read, rather than that are easy to type in an unassisted way. With the Wolfram Function Repository it’s become increasingly common to want to type ResourceFunction. And we’d been thinking that perhaps we should have a special, short notation for that. But with the new autocompletion, one can operationally just press three keys—rfenter—to get to ResourceFunction:

When one designs something and gets the design right, people usually don’t notice; things just “work as they expect”. But when there’s a design error, that’s when people notice—and are frustrated by—the design. But then there’s another case: a situation where, for example, there are two things that could happen, and sometimes one wants one, and sometimes the other. In doing the design, one has to pick a particular branch. And when this happens to be the branch people want, they don’t notice, and they’re happy. But if they want the other branch, it can be confusing and frustrating.

In the design of the Wolfram Language one of the things that has to be chosen is the precedence for every operator: a + b × c means a + (b × c) because × has higher precedence than +. Often the correct order of precedences is fairly obvious. But sometimes it’s simply impossible to make everyone happy all the time. And so it is with and &. It’s very convenient to be able to add & at the end of something you type, and make it into a pure function. But that means if you type a b & it’ll turn the whole thing into a function: a b &. When functions have options, however, one often wants things like name function. The natural tendency is to type this as name body &. But this will mean (name body) & rather than name (body &). And, yes, when you try to run the function, it’ll notice it doesn’t have correct arguments and options specified. But you’d like to know that what you’re typing isn’t right as soon as you type it. And now in Version 13.3 we have a mechanism for that. As soon as you enter & to “end a function”, you’ll see the extent of the function flash:

And, yup, you can see that’s wrong. Which gives you the chance to fix it as:

There’s another notebook-related update in Version 13.3 that isn’t directly related to typing, but will help in the construction of easy-to-navigate user interfaces. We’ve had ActionMenu since 2007—but it’s only been able to create one-level menus. In Version 13.3 it’s been extended to arbitrary hierarchical menus:

Again not directly related to typing, but now relevant to managing and editing code, there’s an update in Version 13.3 to package editing in the notebook interface. Bring up a .wl file and it’ll appear as a notebook. But its default toolbar is different from the usual notebook toolbar (and is newly designed in Version 13.3):

Go To now gives you a way to immediately go to the definition of any function whose name matches what you type, as well as any section, etc.:

The numbers on the right here are code line numbers; you can also go directly to a specific line number by typing :nnn.

The Elegant Code Project

One of the central goals—and achievements—of the Wolfram Language is to create a computational language that can be used not only as a way to tell computers what to do, but also as a way to communicate computational ideas for human consumption. In other words, Wolfram Language is intended not only to be written by humans (for consumption by computers), but also to be read by humans.

Crucial to this is the broad consistency of the Wolfram Language, as well as its use of carefully chosen natural-language-based names for functions, etc. But what can we do to make Wolfram Language as easy and pleasant as possible to read? In the past we’ve balanced our optimization of the appearance of Wolfram Language between reading and writing. But in Version 13.3 we’ve got the beginnings of our Elegant Code project—to find ways to render Wolfram Language to be specifically optimized for reading.

As an example, here’s a small piece of code (from my An Elementary Introduction to the Wolfram Language), shown in the default way it’s rendered in notebooks:

But in Version 13.3 you can use Format > Screen Environment > Elegant to set a notebook to use the current version of “elegant code”:

(And, yes, this is what we’re actually using for code in this post, as well as some other recent ones.) So what’s the difference? First of all, we’re using a proportionally spaced font that makes the names (here of symbols) easy to “read like words”. And second, we’re adding space between these “words”, and graying back “structural elements” like … and … . When you write a piece of code, things like these structural elements need to stand out enough for you to “see they’re right”. But when you’re reading code, you don’t need to pay as much attention to them. Because the Wolfram Language is so based on “word-like” names, you can typically “understand what it’s saying” just by “reading these words”.

Of course, making code “elegant” is not just a question of formatting; it’s also a question of what’s actually in the code. And, yes, as with writing text, it takes effort to craft code that “expresses itself elegantly”. But the good news is that the Wolfram Language—through its uniquely broad and high-level character—makes it surprisingly straightforward to create code that expresses itself extremely elegantly.

But the point now is to make that code not only elegant in content, but also elegant in formatting. In technical documents it’s common to see math that’s at least formatted elegantly. But when one sees code, more often than not, it looks like something only a machine could appreciate. Of course, if the code is in a traditional programming language, it’ll usually be long and not really intended for human consumption. But what if it’s elegantly crafted Wolfram Language code? Well then we’d like it to look as attractive as text and math. And that’s the point of our Elegant Code project.

There are many tradeoffs, and many issues to be navigated. But in Version 13.3 we’re definitely making progress. Here’s an example that doesn’t have so many “words”, but where the elegant code formatting still makes the “blocking” of the code more obvious:

Here’s a slightly longer piece of code, where again the elegant code formatting helps pull out “readable” words, as well as making the overall structure of the code more obvious:

Particularly in recent years, we’ve added many mechanisms to let one write Wolfram Language that’s easier to read. There are the auto operator renderings, like m[[i]] turning into . And then there are things like the notation for pure functions. One particularly important element is Iconize, which lets you show any piece of Wolfram Language input in a visually “iconized” form—which nevertheless evaluates just like the corresponding underlying expression:

Iconize lets you effectively hide details (like large amounts of data, option settings, etc.) But sometimes you want to highlight things. You can do it with Style, Framed, Highlighted—and in Version 13.3, Squiggled:

By default, all these constructs persist through evaluation. But in Version 13.3 all of them now have the option StripOnInput, and with this set, you have something that shows up highlighted in an input cell, but where the highlighting is stripped when the expression is actually fed to the Wolfram Language kernel.

These show their highlighting in the notebook:

But when used in input, the highlighting is stripped:

See More Also…

A great strength of the Wolfram Language (yes, perhaps initiated by my original 1988 Mathematica Book) is its detailed documentation—which has now proved valuable not only for human users but also for AIs. Plotting the number of words that appear in the documentation in successive versions, we see a strong progressive increase:

But with all that documentation, and all those new things to be documented, the problem of appropriately crosslinking everything has increased. Even back in Version 1.0, when the documentation was a physical book, there were “See Also’s” between functions:

And by now there’s a complicated network of such See Also’s:

But that’s just the network of how functions point to functions. What about other kinds of constructs? Like formats, characters or entity types—or, for that matter, entries in the Wolfram Function Repository, Wolfram Data Repository, etc. Well, in Version 13.3 we’ve done a first iteration of crosslinking all these kinds of things.

So here now are the “See Also” areas for Graph and Molecule:

Not only are there functions here; there are also other kinds of things that a person (or AI) looking at these pages might find relevant.

It’s great to be able to follow links, but sometimes it’s better just to have material immediately accessible, without following a link. Back in Version 1.0 we made the decision that when a function inherits some of its options from a “base function” (say Plot from Graphics), we only need to explicitly list the non-inherited option values. At the time, this was a good way to save a little paper in the printed book. But now the optimization is different, and finally in Version 13.3 we have a way to show “All Options”—tucked away so it doesn’t distract from the typically-more-important non-inherited options.

Here’s the setup for Plot. First, the list of non-inherited option values:

Then, at the end of the Details section

which opens to:

Pictures from Words: Generative AI for Images

One of the remarkable things that’s emerged as a possibility from recent advances in AI and neural nets is the generation of images from textual descriptions. It’s not yet realistic to do this at all well on anything but a high-end (and typically server) GPU-enabled machine. But in Version 13.3 there’s now a built-in function ImageSynthesize that can get images synthesized, for now through an external API.

You give text, and ImageSynthesize will try to generate images for which that text is a description:

Sometimes these images will be directly useful in their own right, perhaps as “theming images” for documents or user interfaces. Sometimes they will provide raw material that can be developed into icons or other art. And sometimes they are most useful as inputs to tests or other algorithms.

And one of the important things about ImageSynthesize is that it can immediately be used as part of any Wolfram Language workflow. Pick a random sentence from Alice in Wonderland:

Now ImageSynthesize can “illustrate” it:

Or we can get AI to feed AI:

ImageSynthesize is set up to automatically be able to synthesize images of different sizes:

You can take the output of ImageSynthesize and immediately process it:

ImageSynthesize can not only produce complete images, but can also fill in transparent parts of “incomplete” images:

In addition to ImageSynthesize and all its new LLM functionality, Version 13.3 also includes a number of advances in the core machine learning system for Wolfram Language. Probably the most notable are speedups of up to 10x and beyond for neural net training and evaluation on x86-compatible systems, as well as better models for ImageIdentify. There are also a variety of new networks in the Wolfram Neural Net Repository, particularly ones based on transformers.

Digital Twins: Fitting System Models to Data

It’s been five years since we first began to introduce industrial-scale systems engineering capabilities in the Wolfram Language. The goal is to be able to compute with models of engineering and other systems that can be described by (potentially very large) collections of ordinary differential equations and their discrete analogs. Our separate Wolfram System Modeler product provides an IDE and GUI for graphically creating such models.

For the past five years we’ve been able to do high-efficiency simulation of these models from within the Wolfram Language. And over the past few years we’ve been adding all sorts of higher-level functionality for programmatically creating models, and for systematically analyzing their behavior. A major focus in recent versions has been the synthesis of control systems, and various forms of controllers.

Version 13.3 now tackles a different issue, which is the alignment of models with real-world systems. The idea is to have a model which contains certain parameters, and then to determine these parameters by essentially fitting the model’s behavior to observed behavior of a real-world system.

Let’s start by talking about a simple case where our model is just defined by a single ODE:

This ODE is simple enough that we can find its analytical solution:

So now let’s make some “simulated real-world data” assuming a = 2, and with some noise:

Here’s what the data looks like:

Now let’s try to “calibrate” our original model using this data. It’s a process similar to machine learning training. In this case we make an “initial guess” that the parameter a is 1; then when SystemModelCalibrate runs it shows the “loss” decreasing as the correct value of a is found:

The “calibrated” model does indeed have a ≈ 2:

Now we can compare the calibrated model with the data:

As a slightly more realistic engineering-style example let’s look at a model of an electric motor (with both electrical and mechanical parts):

Let’s say we’ve got some data on the behavior of the motor; here we’ve assumed that we’ve measured the angular velocity of a component in the motor as a function of time. Now we can use this data to calibrate parameters of the model (here the resistance of a resistor and the damping constant of a damper):

Here are the fitted parameter values:

And here’s a full plot of the angular velocity data, together with the fitted model and its 95% confidence bands:

SystemModelCalibrate can be used not only in fitting a model to real-world data, but also for example in fitting simpler models to more complicated ones, making possible various forms of “model simplification”.

Symbolic Testing Framework

The Wolfram Language is by many measures one of the world’s most complex pieces of software engineering. And over the decades we’ve developed a large and powerful system for testing and validating it. A decade ago—in Version 10—we began to make some of our internal tools available for anyone writing Wolfram Language code. Now in Version 13.3 we’re introducing a more streamlined—and “symbolic”—version of our testing framework.

The basic idea is that each test is represented by a symbolic TestObject, created using TestCreate:

On its own, TestObject is an inert object. You can run the test it represents using TestEvaluate:

Each test object has a whole collection of properties, some of which only get filled in when the test is run:

It’s very convenient to have symbolic test objects that one can manipulate using standard Wolfram Language functions, say selecting tests with particular features, or generating new tests from old. And when one builds a test suite, one does it just by making a list of test objects.

This makes a list of test objects (and, yes, there’s some trickiness because TestCreate needs to keep unevaluated the expression that’s going to be tested):

But given these tests, we can now generate a report from running them:

TestReport has various options that allow you to monitor and control the running of a test suite. For example, here we’re saying to echo every "TestEvaluated" event that occurs:

Did You Get That Math Right?

Most of what the Wolfram Language is about is taking inputs from humans (as well as programs, and now AIs) and computing outputs from them. But a few years ago we started introducing capabilities for having the Wolfram Language ask questions of humans, and then assessing their answers.

In recent versions we’ve been building up sophisticated ways to construct and deploy “quizzes” and other collections of questions. But one of the core issues is always how to determine whether a person has answered a particular question correctly. Sometimes that’s easy to determine. If we ask “What is 2 + 2?”, the answer better be “4” (or conceivably “four”). But what if we ask a question where the answer is some algebraic expression? The issue is that there may be many mathematically equal forms of that expression. And it depends on what exactly one’s asking whether one considers a particular form to be the “right answer” or not.

For example, here we’re computing a derivative:

And here we’re doing a factoring problem:

These two answers are mathematically equal. And they’d both be “reasonable answers” for the derivative if it appeared as a question in a calculus course. But in an algebra course, one wouldn’t want to consider the unfactored form a “correct answer” to the factoring problem, even though it’s “mathematically equal”.

And to deal with these kinds of issues, we’re introducing in Version 13.3 more detailed mathematical assessment functions. With a "CalculusResult" assessment function, it’s OK to give the unfactored form:

But with a "PolynomialResult" assessment function, the algebraic form of the expression has to be the same for it to be considered “correct”:

There’s also another type of assessment function—"ArithmeticResult"—which only allows trivial arithmetic rearrangements, so that it considers 2 + 3 equivalent to 3 + 2, but doesn’t consider 2/3 equivalent to 4/6:

Here’s how you’d build a question with this:

And now if you type “2/3” it’ll say you’ve got it right, but if you type “4/6” it won’t. However, if you use, say, "CalculusResult" in the assessment function, it’ll say you got it right even if you type “4/6”.

Streamlining Parallel Computation

Ever since the mid-1990s there’s been the capability to do parallel computation in the Wolfram Language. And certainly for me it’s been critical in a whole range of research projects I’ve done. I currently have 156 cores routinely available in my “home” setup, distributed across 6 machines. It’s sometimes challenging from a system administration point of view to keep all those machines and their networking running as one wants. And one of the things we’ve been doing in recent versions—and now completed in Version 13.3—is to make it easier from within the Wolfram Language to see and manage what’s going on.

It all comes down to specifying the configuration of kernels. And in Version 13.3 that’s now done using symbolic KernelConfiguration objects. Here’s an example of one:

There’s all sorts of information in the kernel configuration object:

It describes “where” a kernel with that configuration will be, how to get to it, and how it should be launched. The kernel might just be local to your machine. Or it might be on a remote machine, accessible through ssh, or https, or our own wstp (Wolfram Symbolic Transport Protocol) or lwg (Lightweight Grid) protocols.

In Version 13.3 there’s now a GUI for setting up kernel configurations:

The Kernel Configuration Editor lets you enter all the details that are needed, about network connections, authentication, locations of executables, etc.

But once you’ve set up a KernelConfiguration object, that’s all you ever need—for example to say “where” to do a remote evaluation:

ParallelMap and other parallel functions then just work by doing their computations on kernels specified by a list of KernelConfiguration objects. You can set up the list in the Kernels Settings GUI:

Here’s my personal default collection of parallel kernels:

This now counts the number of individual kernels running on each machine specified by these configurations:

In Version 13.3 a convenient new feature is named collections of kernels. For example, this runs a single “representative” kernel on each distinct machine:

Just Call That C Function! Direct Access to External Libraries

Let’s say you’ve got an external library written in C—or in some other language that can compile to a C-compatible library. In Version 13.3 there’s now foreign function interface (FFI) capability that allows you to directly call any function in the external library just using Wolfram Language code.

Here’s a very trivial C function:

This function happens to be included in compiled form in the compilerDemoBase library that’s part of Wolfram Language documentation. Given this library, you can use ForeignFunctionLoad to load the library and create a Wolfram Language function that directly calls the C addone function. All you need do is specify the library and C function, and then give the type signature for the function:

Now ff is a Wolfram Language function that calls the C addone function:

The C function addone happens to have a particularly simple type signature, that can immediately be represented in terms of compiler types that have direct analogs as Wolfram Language expressions. But in working with low-level languages, it’s very common to have to deal directly with raw memory, which is something that never happens when you’re purely working at the Wolfram Language level.

So, for example, in the OpenSSL library there’s a function called RAND_bytes, whose C type signature is:

And the important thing to notice is that this contains a pointer to a buffer buf that gets filled by RAND_bytes. If you were calling RAND_bytes from C, you’d first allocate memory for this buffer, then—after calling RAND_bytes—read back whatever was written to the buffer. So how can you do something analogous when you’re calling RAND_bytes using ForeignFunction in Wolfram Language? In Version 13.3 we’re introducing a family of constructs for working with pointers and raw memory.

So, for example, here’s how we can create a Wolfram Language foreign function corresponding to RAND_bytes:

But to actually use this, we need to be able to allocate the buffer, which in Version 13.3 we can do with RawMemoryAllocate:

This creates a buffer that can store 10 unsigned chars. Now we can call rb, giving it this buffer:

rb will fill the buffer—and then we can import the results back into Wolfram Language:

There’s some complicated stuff going on here. RawMemoryAllocate does ultimately allocate raw memory—and you can see its hex address in the symbolic object that’s returned. But RawMemoryAllocate creates a ManagedObject, which keeps track of whether it’s being referenced, and automatically frees the memory that’s been allocated when nothing references it anymore.

Long ago languages like BASIC provided PEEK and POKE functions for reading and writing raw memory. It was always a dangerous thing to do—and it’s still dangerous. But it’s somewhat higher level in Wolfram Language, where in Version 13.3 there are now functions like RawMemoryRead and RawMemoryWrite. (For writing data into a buffer, RawMemoryExport is also relevant.)

Most of the time it’s very convenient to deal with memory-managed ManagedObject constructs. But for the full low-level experience, Version 13.3 provides UnmanageObject, which disconnects automatic memory management for a managed object, and requires you to explicitly use RawMemoryFree to free it.

One feature of C-like languages is the concept of a function pointer. And normally the function that the pointer is pointing to is just something like a C function. But in Version 13.3 there’s another possibility: it can be a function defined in Wolfram Language. Or, in other words, from within an external C function it’s possible to call back into the Wolfram Language.

Let’s use this C program:

You can actually compile it right from Wolfram Language using:

Now we load frun as a foreign function—with a type signature that uses "OpaqueRawPointer" to represent the function pointer:

What we need next is to create a function pointer that points to a callback to Wolfram Language:

The Wolfram Language function here is just Echo. But when we call frun with the cbfun function pointer we can see our C code calling back into Wolfram Language to evaluate Echo:

ForeignFunctionLoad provides an extremely convenient way to call external C-like functions directly from top-level Wolfram Language. But if you’re calling C-like functions a great many times, you’ll sometimes want to do it using compiled Wolfram Language code. And you can do this using the LibraryFunctionDeclaration mechanism that was introduced in Version 13.1. It’ll be more complicated to set up, and it’ll require an explicit compilation step, but there’ll be slightly less “overhead” in calling the external functions.

The Advance of the Compiler Continues

For several years we’ve had an ambitious project to develop a large-scale compiler for the Wolfram Language. And in each successive version we’re further extending and enhancing the compiler. In Version 13.3 we’ve managed to compile more of the compiler itself (which, needless to say, is written in Wolfram Language)—thereby making the compiler more efficient in compiling code. We’ve also enhanced the performance of the code generated by the compiler—particularly by optimizing memory management done in the compiled code.

Over the past several versions we’ve been steadily making it possible to compile more and more of the Wolfram Language. But it’ll never make sense to compile everything—and in Version 13.3 we’re adding KernelEvaluate to make it more convenient to call back from compiled code to the Wolfram Language kernel.

Here’s an example:

We’ve got an argument n that’s declared as being of type MachineInteger. Then we’re doing a computation on n in the kernel, and using TypeHint to specify that its result will be of type MachineInteger. There’s at least arithmetic going on outside the KernelEvaluate that can be compiled, even though the KernelEvaluate is just calling uncompiled code:

There are other enhancements to the compiler in Version 13.3 as well. For example, Cast now allows data types to be cast in a way that directly emulates what the C language does. There’s also now SequenceType, which is a type analogous to the Wolfram Language Sequence construct—and able to represent an arbitrary-length sequence of arguments to a function.

And Much More…

In addition to everything we’ve already discussed here, there are lots of other updates and enhancements in Version 13.3—as well as thousands of bug fixes.

Some of the additions fill out corners of functionality, adding completeness or consistency. Statistical fitting functions like LinearModelFit now accept input in all various association etc. forms that machine learning functions like Classify accept. TourVideo now lets you “tour” GeoGraphics, with waypoints specified by geo positions. ByteArray now supports the “corner case” of zero-length byte arrays. The compiler can now handle byte array functions, and additional string functions. Nearly 40 additional special functions can now handle numeric interval computations. BarcodeImage adds support for UPCE and Code93 barcodes. SolidMechanicsPDEComponent adds support for the Yeoh hyperelastic model. And—twenty years after we first introduced export of SVG, there’s now built-in support for import of SVG not only to raster graphics, but also to vector graphics.

There are new “utility” functions like RealValuedNumberQ and RealValuedNumericQ. There’s a new function FindImageShapes that begins the process of systematically finding geometrical forms in images. There are a number of new data structures—like "SortedKeyStore" and "CuckooFilter".

There are also functions whose algorithms—and output—have been improved. ImageSaliencyFilter now uses new machine-learning-based methods. RSolveValue gives cleaner and smaller results for the important case of linear difference equations with constant coefficients.

Download your 13.3 now! » (It’s already live in the Wolfram Cloud!)