Stephen Wolfram Blog Stephen Wolfram's Personal Blog Thu, 15 Aug 2019 16:11:09 +0000 en-US hourly 1 <![CDATA[Mitchell Feigenbaum (1944–2019), 4.66920160910299067185320382…]]> Tue, 23 Jul 2019 18:23:53 +0000 Stephen Wolfram feigenbaum_icon(Artwork by Gunilla Feigenbaum) Behind the Feigenbaum Constant It’s called the Feigenbaum constant, and it’s about 4.6692016. And it shows up, quite universally, in certain kinds of mathematical—and physical—systems that can exhibit chaotic behavior. Mitchell Feigenbaum, who died on June 30 at the age of 74, was the person who discovered it—back in 1975, by [...]]]> feigenbaum_icon
Mitchell Feigenbaum
(Artwork by Gunilla Feigenbaum)

Behind the Feigenbaum Constant

It’s called the Feigenbaum constant, and it’s about 4.6692016. And it shows up, quite universally, in certain kinds of mathematical—and physical—systems that can exhibit chaotic behavior.

Mitchell Feigenbaum, who died on June 30 at the age of 74, was the person who discovered it—back in 1975, by doing experimental mathematics on a pocket calculator.

It became a defining discovery in the history of chaos theory. But when it was first discovered, it was a surprising, almost bizarre result, that didn’t really connect with anything that had been studied before. Somehow, though, it’s fitting that it should have been Mitchell Feigenbaum—who I knew for nearly 40 years—who would discover it.

Trained in theoretical physics, and a connoisseur of its mathematical traditions, Mitchell always seemed to see himself as an outsider. He looked a bit like Beethoven—and projected a certain stylish sense of intellectual mystery. He would often make strong assertions, usually with a conspiratorial air, a twinkle in his eye, and a glass of wine or a cigarette in his hand.

He would talk in long, flowing sentences which exuded a certain erudite intelligence. But ideas would jump around. Sometimes detailed and technical. Sometimes leaps of intuition that I, for one, could not follow. He was always calculating, staying up until 5 or 6 am, filling yellow pads with formulas and stressing Mathematica with elaborate algebraic computations that might run for hours.

He published very little, and what he did publish he was often disappointed wasn’t widely understood. When he died, he had been working for years on the optics of perception, and on questions like why the Moon appears larger when it’s close to the horizon. But he never got to the point of publishing anything on any of this.

For more than 30 years, Mitchell’s official position (obtained essentially on the basis of his Feigenbaum constant result) was as a professor at the Rockefeller University in New York City. (To fit with Rockefeller’s biological research mission, he was themed as the Head of the “Laboratory of Mathematical Physics”.) But he dabbled elsewhere, lending his name to a financial computation startup, and becoming deeply involved in inventing new cartographic methods for the Hammond World Atlas.

What Mitchell Discovered

The basic idea is quite simple. Take a value x between 0 and 1. Then iteratively replace x by a x (1 – x). Let’s say one starts from x = , and takes a = 3.2. Then here’s what one gets for the successive values of x:

Successive values

ListLinePlot[NestList[Compile[x, 3.2 x (1 - x)], N[1/3], 50], 
 Mesh -> All, PlotRange -> {0, 1}, Frame -> True]

After a little transient, the values of x are periodic, with period 2. But what happens with other values of a? Here are a few results for this so-called “logistic map”:

Logistic map

    ListLinePlot[NestList[Compile[x, a x (1 - x)], N[1/3], 50], 
     Mesh -> All, PlotRange -> {0, 1}, Frame -> True, 
      FrameTicks -> None]], StringTemplate["a = ``"][a]], {a, 2.75, 
    4, .25}], 3], Spacings -> {.1, -.1}]

For small a, the values of x quickly go to a fixed point. For larger a they become periodic, first with period 2, then 4. And finally, for larger a, the values start bouncing around seemingly randomly.

One can summarize this by plotting the values of x (here, 300, after dropping the first 50 to avoid transients) reached as a function of the value of a:

Period doublings

  Table[{a, #} & /@ 
    Drop[NestList[Compile[x, a x (1 - x)], N[1/3], 300], 50], {a, 0, 
    4, .01}], 1], Frame -> True, FrameLabel -> {"a", "x"}]

As a increases, one sees a cascade of “period doublings”. In this case, they’re at a = 3, a 3.449, a 3.544090, a 3.5644072. What Mitchell noticed is that these successive values approach a limit (here a 3.569946) in a geometric sequence, with aan ~ δ-n and δ 4.669.

That’s a nice little result. But here’s what makes it much more significant: it isn’t just true about the specific iterated map xa x (1 – x); it’s true about any map like that. Here, for example, is the “bifurcation diagram” for xa sin(π ):

Bifucation diagram

  Table[{a, #} & /@ 
    Drop[NestList[Compile[x, a Sin[Pi Sqrt@x]], N[1/3], 300], 50], {a,
     0, 1, .002}], 1], Frame -> True, FrameLabel -> {"a", "x"}]

The details are different. But what Mitchell noticed is that the positions of the period doublings again form a geometric sequence, with the exact same base: δ 4.669.

It’s not just that different iterated maps give qualitatively similar results; when one measures the convergence rate this turns out be exactly and quantitatively the same—always δ 4.669. And this was Mitchell’s big discovery: a quantitatively universal feature of the approach to chaos in a class of systems.

The Scientific Backstory

The basic idea behind iterated maps has a long history, stretching all the way back to antiquity. Early versions arose in connection with finding successive approximations, say to square roots. For example, using Newton’s method from the late 1600s, can be obtained by iterating x (here starting from x = 1):

Starting from x = 1

NestList[Function[x, 1/x + x/2], N[1, 8], 6]

The notion of iterating an arbitrary function seems to have first been formalized in an 1870 paper by Ernst Schröder (who was notable for his work in formalizing things from powers to Boolean algebra), although most of the discussion that arose was around solving functional equations, not actually doing iterations. (An exception was the investigation of regions of convergence for Newton’s approximation by Arthur Cayley in 1879.) In 1918 Gaston Julia made a fairly extensive study of iterated rational functions in the complex plane—inventing, if not drawing, Julia sets. But until fractals in the late 1970s (which soon led to the Mandelbrot set), this area of mathematics basically languished.

But quite independent of any pure mathematical developments, iterated maps with forms similar to xa x (1 – x) started appearing in the 1930s as possible practical models in fields like population biology and business cycle theory—usually arising as discrete annualized versions of continuous equations like the Verhulst logistic differential equation from the mid-1800s. Oscillatory behavior was often seen—and in 1954 William Ricker (one of the founders of fisheries science) also found more complex behavior when he iterated some empirical fish reproduction curves.

Back in pure mathematics, versions of iterated maps had also shown up from time to time in number theory. In 1799 Carl Friedrich Gauss effectively studied the map x FractionalPart[] in connection with continued fractions. And starting in the late 1800s there was interest in studying maps like x FractionalPart[a x] and their connections to the properties of the number a.

Particularly following Henri Poincaré’s work on celestial mechanics around 1900, the idea of sensitive dependence on initial conditions arose, and it was eventually noted that iterated maps could effectively “excavate digits” in their initial conditions. For example, iterating xFractionalPart[10 x], starting with the digits of π, gives (effectively just shifting the sequence of digits one place to the left at each step):

Starting with the digits of pi...

N[NestList[Function[x, FractionalPart[10 x]], N[Pi, 100], 5], 10]


 Rest@N[NestList[Function[x, FractionalPart[10 x]], N[Pi, 100], 50], 
   40], Mesh -> All]

(Confusingly enough, with typical “machine precision” computer arithmetic, this doesn’t work correctly, because even though one “runs out of precision”, the IEEE Floating Point standard says to keep on delivering digits, even though they are completely wrong. Arbitrary precision in the Wolfram Language gets it right.)

Maps like xa x(1 – x) show similar kinds of “digit excavation” behavior (for example, replacing x by sin[π u]2, x ⟶ 4 x(1 – x) becomes exactly uFractionalPart[u, 2]—and this was already known by the 1940s, and, for example, commented on by John von Neumann in connection with his 1949 iterative “middle-square” method for generating pseudorandom numbers by computer.

But what about doing experimental math on iterated maps? There wasn’t too much experimental math at all on early digital computers (after all, most computer time was expensive). But in the aftermath of the Manhattan Project, Los Alamos had built its own computer (named MANIAC), that ended up being used for a whole series of experimental math studies. And in 1964 Paul Stein and Stan Ulam wrote a report entitled “Non-linear Transformation Studies on Electronic Computers” that included photographs of oscilloscope-like MANIAC screens displaying output from some fairly elaborate iterated maps. In 1971, another “just out of curiosity” report from Los Alamos (this time by Nick Metropolis [leader of the MANIAC project, and developer of the Monte Carlo method], Paul Stein and his brother Myron Stein) started to give more specific computer results for the behavior logistic maps, and noted the basic phenomenon of period doubling (which they called the “U-sequence”), as well as its qualitative robustness under changes in the underlying map.

But quite separately from all of this, there were other developments in physics and mathematics. In 1964 Ed Lorenz (a meteorologist at MIT) introduced and simulated his “naturally occurring” Lorenz differential equations, that showed sensitive dependence on initial conditions. Starting in the 1940s (but following on from Poincaré’s work around 1900) there’d been a steady stream of developments in mathematics in so-called dynamical systems theory—particularly investigating global properties of the solutions to differential equations. Usually there’d be simple fixed points observed; sometimes “limit cycles”. But by the 1970s, particularly after the arrival of early computer simulations (like Lorenz’s), it was clear that for nonlinear equations something else could happen: a so-called “strange attractor”. And in studying so-called “return maps” for strange attractors, iterated maps like the logistic map again appeared.

But it was in 1975 that various threads of development around iterated maps somehow converged. On the mathematical side, dynamical systems theorist Jim Yorke and his student Tien-Yien Li at the University of Maryland published their paper “Period Three Implies Chaos”, showing that in an iterated map with a particular parameter value, if there’s ever an initial condition that leads to a cycle of length 3, there must be other initial conditions that don’t lead to cycles at all—or, as they put it, show chaos. (As it turned out, Aleksandr Sarkovskii—who was part of a Ukrainian school of dynamical systems research—had already in 1962 proved the slightly weaker result that a cycle of period 3 implies cycles of all periods.)

But meanwhile there had also been growing interest in things like the logistic maps among mathematically oriented population biologists, leading to the rather readable review (published in mid-1976) entitled “Simple Mathematical Models with Very Complicated Dynamics” by physics-trained Australian Robert May, who was then a biology professor at Princeton (and would subsequently become science advisor to the UK government, and is now “Baron May of Oxford”).

But even though things like sketches of bifurcation diagrams existed, the discovery of their quantitatively universal properties had to await Mitchell Feigenbaum and his discovery.

Mitchell’s Journey

Mitchell Feigenbaum grew up in Brooklyn, New York. His father was an analytical chemist, and his mother was a public-school teacher. Mitchell was unenthusiastic about school, though did well on math and science tests, and managed to teach himself calculus and piano. In 1960, at age 16, as something of a prodigy, he enrolled in the City College of New York, officially studying electrical engineering, but also taking physics and math classes. After graduating in 1964, he went to MIT. Initially he was going to do a PhD in electrical engineering, but he quickly switched to physics.

But although he was enamored of classic mathematical physics (as represented, for example, in the books of Landau and Lifshiftz), he ended up writing his thesis on a topic set by his advisor about particle physics, and specifically about evaluating a class of Feynman diagrams for the scattering of photons by scalar particles (with lots of integrals, if not special functions). It wasn’t a terribly exciting thesis, but in 1970 he was duly dispatched to Cornell for a postdoc position.

Mitchell struggled with motivation, preferring to hang out in coffee shops doing the New York Times crossword (at which he was apparently very fast) to doing physics. But at Cornell, Mitchell made several friends who were to be important to him. One was Predrag Cvitanović, a star graduate student from what is now Croatia, who was studying quantum electrodynamics, and with whom he shared an interest in German literature. Another was a young poet named Kathleen Doorish (later, Kathy Hammond), who was a friend of Predrag’s. And another was a rising-star physics professor named Pete Carruthers, with whom he shared an interest in classical music.

In the early 1970s quantum field theory was entering a golden age. But despite the topic of his thesis, Mitchell didn’t get involved, and in the end, during his two years at Cornell, he produced no visible output at all. Still, he had managed to impress Hans Bethe enough to be dispatched for another postdoc position, though now at a place lower in the pecking order of physics, Virginia Polytechnic Institute, in rural Virginia.

At Virginia Tech, Mitchell did even less well than at Cornell. He didn’t interact much with people, and he produced only one three-page paper: “The Relationship between the Normalization Coefficient and Dispersion Function for the Multigroup Transport Equation”. As its title might suggest, the paper was quite technical and quite unexciting.

As Mitchell’s two years at Virginia Tech drew to a close it wasn’t clear what was going to happen. But luck intervened. Mitchell’s friend from Cornell, Pete Carruthers, had just been hired to build up the theory division (“T Division”) at Los Alamos, and given carte blanche to hire several bright young physicists. Pete would later tell me with pride (as part of his advice to me about general scientific management) that he had a gut feeling that Mitchell could do something great, and that despite other people’s input—and the evidence—he decided to bet on Mitchell.

Having brought Mitchell to Los Alamos, Pete set about suggesting projects for him. At first, it was following up on some of Pete’s own work, and trying to compute bulk collective (“transport”) properties of quantum field theories as a way to understand high-energy particle collisions—a kind of foreshadowing of investigations of quark-gluon plasma.

But soon Pete suggested that Mitchell try looking at fluid turbulence, and in particular on seeing whether renormalization group methods might help in understanding it.

Whenever a fluid—like water—flows sufficiently rapidly it forms lots of little eddies and behaves in a complex and seemingly random way. But even though this qualitative phenomenon had been discussed for centuries (with, for example, Leonardo da Vinci making nice pictures of it), physics had had remarkably little to say about it—though in the 1940s Andrei Kolmogorov had given a simple argument that the eddies should form a cascade with a k distribution of energies. At Los Alamos, though, with its focus on nuclear weapons development (inevitably involving violent fluid phenomena), turbulence was a very important thing to understand—even if it wasn’t obvious how to approach it.

But in 1974, there was news that Ken Wilson from Cornell had just “solved the Kondo problem” using a technique called the renormalization group. And Pete Carruthers suggested that Mitchell should try to apply this technique to turbulence.

The renormalization group is about seeing how changes of scale (or other parameters) affect descriptions (and behavior) of systems. And as it happened, it was Mitchell’s thesis advisor at MIT, Francis Low, who, along with Murray Gell-Mann, had introduced it back in 1954 in the context of quantum electrodynamics. The idea had lain dormant for many years, but in the early 1970s it came back to life with dramatic—though quite different—applications in both particle physics (specifically, QCD) and condensed matter physics.

In a piece of iron at room temperature, you can basically get all electron spins associated with each atom lined up, so the iron is magnetized. But if you heat the iron up, there start to be fluctuations, and suddenly—above the so-called Curie temperature (770°C for iron)—there’s effectively so much randomness that the magnetization disappears. And in fact there are lots of situations (think, for example, melting or boiling—or, for that matter, the formation of traffic jams) where this kind of sudden so-called phase transition occurs.

But what is actually going on in a phase transition? I think the clearest way to see this is by looking at an analog in cellular automata. With the particular rule shown below, if there aren’t very many initial black cells, the whole system will soon be white. But if you increase the number of initial black cells (as a kind of analog of increasing the temperature in a magnetic system), then suddenly, in this case at 50% black, there’s a sharp transition, and now the whole system eventually becomes black. (For phase transition experts: yes, this is a phase transition in a 1D system; one only needs 2D if the system is required to be microscopically reversible.)


     "RuleNumber" -> 294869764523995749814890097794812493824, 
     "Colors" -> 4|>, 
    3 Boole[Thread[RandomReal[{0, 1}, 2000] < rho]], {500, {-300, 
      300}}], FrameLabel -> {None, 
Round[100 rho], "% black"}]}], {rho, {0.4, 0.45, 0.55, 0.6}}], -30]

But what does the system do near 50% black? In effect, it can’t decide whether to finally become black or white. And so it ends up showing a whole hierarchy of “fluctuations” from the smallest scales to the largest. And what became clear by the 1960s is that the “critical exponents” characterizing the power laws describing these fluctuations are universal across many different systems.

But how can one compute these critical exponents? In a few toy cases, analytical methods were known. But mostly, something else was needed. And in the late 1960s Ken Wilson realized that one could use the renormalization group, and computers. One might have a model for how individual spins interact. But the renormalization group gives a procedure for “scaling up” to the interactions of larger and larger blocks of spins. And by studying that on a computer, Ken Wilson was able to start computing critical exponents.

At first, the physics world didn’t pay much attention, not least because they weren’t used to computers being so intimately in the loop in theoretical physics. But then there was the Kondo problem (and, yes, so far as I know, it has no relation to modern Kondoing—though it does relate to modern quantum dot cellular automata). In most materials, electrical resistivity decreases as the temperature decreases (going to zero for superconductors even above absolute zero). But back in the 1930s, measurements on gold had shown instead an increase of resistivity at low temperatures. By the 1960s, it was believed that this was due to the scattering of electrons from magnetic impurities—but calculations ran into trouble, generating infinite results.

But then, in 1975, Ken Wilson applied his renormalization group methods—and correctly managed to compute the effect. There was still a certain mystery about the whole thing (and it probably didn’t help that—at least when I knew him in the 1980s and beyond—I often found Ken Wilson’s explanations quite hard to understand). But the idea that the renormalization group could be important was established.

So how might it apply to fluid turbulence? Kolmogorov’s power law seemed suggestive. But could one take the Navier–Stokes equations which govern idealized fluid flow and actually derive something like this? This was the project on which Mitchell Feigenbaum embarked.

The Big Discovery

The Navier–Stokes equations are very hard to work with. In fact, to this day it’s still not clear how even the most obvious feature of turbulence—its apparent randomness—arises from these equations. (It could be that the equations aren’t a full or consistent mathematical description, and one’s actually seeing amplified microscopic molecular motions. It could be that—as in chaos theory and the Lorenz equations—it’s due to amplification of randomness in the initial conditions. But my own belief, based on work I did in the 1980s, is that it’s actually an intrinsic computational phenomenon—analogous to the randomness one sees in my rule 30 cellular automaton.)

So how did Mitchell approach the problem? He tried simplifying it—first by going from equations depending on both space and time to ones depending only on time, and then by effectively making time discrete, and looking at iterated maps. Through Paul Stein, Mitchell knew about the (not widely known) previous work at Los Alamos on iterated maps. But Mitchell didn’t quite know where to go with it, though having just got a swank new HP-65 programmable calculator, he decided to program iterated maps on it.

Then in July 1975, Mitchell went (as I also did a few times in the early 1980s) to the summer physics hang-out-together event in Aspen, CO. There he ran into Steve Smale—a well-known mathematician who’d been studying dynamical systems—and was surprised to find Smale talking about iterated maps. Smale mentioned that someone had asked him if the limit of the period-doubling cascade a 3.56995 could be expressed in terms of standard constants like π and . Smale related that he’d said he didn’t know. But Mitchell’s interest was piqued, and he set about trying to figure it out.

He didn’t have his HP-65 with him, but he dove into the problem using the standard tools of a well-educated mathematical physicist, and had soon turned it into something about poles of functions in the complex plane—about which he couldn’t really say anything. Back at Los Alamos in August, though, he had his HP-65, and he set about programming it to find the bifurcation points an.

The iterative procedure ran pretty fast for small n. But by n = 5 it was taking 30 seconds. And for n = 6 it took minutes. While it was computing, however, Mitchell decided to look at the an values he had so far—and noticed something: they seemed to be converging geometrically to a final value.

At first, he just used this fact to estimate a, which he tried—unsuccessfully—to express in terms of standard constants. But soon he began to think that actually the convergence exponent δ was more significant than a—since its value stayed the same under simple changes of variables in the map. For perhaps a month Mitchell tried to express δ in terms of standard constants.

But then, in early October 1975, he remembered that Paul Stein had said period doubling seemed to look the same not just for logistic maps but for any iterated map with a single hump. Reunited with his HP-65 after a trip to Caltech, Mitchell immediately tried the map x ⟶ sin(x)—and discovered that, at least to 3-digit precision, the exponent δ was exactly the same.

He was immediately convinced that he’d discovered something great. But Stein told him he needed more digits to really conclude much. Los Alamos had plenty of powerful computers—so the next day Mitchell got someone to show him how to write a program in FORTRAN on one of them to go further—and by the end of the day he had managed to compute that in both cases δ was about 4.6692.

The computer he used was a typical workhorse US scientific computer of the day: a CDC 6000 series machine (of the same type I used when I first moved to the US in 1978). It had been designed by Seymour Cray, and by default it used 60-bit floating-point numbers. But at this precision (about 14 decimal digits), 4.6692 was as far as Mitchell could compute. Fortunately, however, Pete’s wife Lucy Carruthers was a programmer at Los Alamos, and she showed Mitchell how to use double precision—with the result that he was able to compute δ to 11-digit precision, and determine that the values for his two different iterated maps agreed.

Within a few weeks, Mitchell had found that δ seemed to be universal whenever the iterated map had a single quadratic maximum. But he didn’t know why this was, or have any particular framework for thinking about it. But still, finally, at the age of 30, Mitchell had discovered something that he thought was really interesting.

On Mitchell’s birthday, December 19, he saw his friend Predrag, and told him about his result. But at the time, Predrag was working hard on mainstream particle physics, and didn’t pay too much attention.

Mitchell continued working, and within a few months he was convinced that not only was the exponent δ universal—the appropriately scaled, limiting, infinitely wiggly, actual iteration of the map was too. In April 1976 Mitchell wrote a report announcing his results. Then on May 2, 1976, he gave a talk about them at the Institute for Advanced Study in Princeton. Predrag was there, and now he got interested in what Mitchell was doing.

As so often, however, it was hard to understand just what Mitchell was talking about. But by the next day, Predrag had successfully simplified things, and come up with a single, explicit, functional equation for the limiting form of the scaled iterated map: g(g(x)) = , with α 2.50290—implying that for any iterated map of the appropriate type, the limiting form would always look like an even wigglier version of:

FeigenbaumFunction plot

fUD[z_] = 
  1. - 1.5276329970363323 z^2 + 0.1048151947874277 z^4 + 
   0.026705670524930787 z^6 - 0.003527409660464297 z^8 + 
   0.00008160096594827505 z^10 + 0.000025285084886512315 z^12 - 
   2.5563177536625283*^-6 z^14 - 9.65122702290271*^-8 z^16 + 
   2.8193175723520713*^-8 z^18 - 2.771441260107602*^-10 z^20 - 
   3.0292086423142963*^-10 z^22 + 2.6739057855563045*^-11 z^24 + 
   9.838888060875235*^-13 z^26 - 3.5838769501333333*^-13 z^28 + 
   2.063994985307743*^-14 z^30;
   fCF = Compile[{z}, 
    Module[{\[Alpha] = -2.5029078750959130867, n, \[Zeta]},
     n = If[Abs[z] <= 1., 0, Ceiling[Log[-\[Alpha], Abs[z]]]];
     \[Zeta] = z/\[Alpha]^n;
     Do[\[Zeta] = #, {2^n}];
     \[Alpha]^n \[Zeta]]] &[fUD[\[Zeta]]];
     Plot[fCF[x], {x, -100, 100}, MaxRecursion -> 5, PlotRange -> All]

How It Developed

The whole area of iterated maps got a boost on June 10, 1976, with the publication in Nature of Robert May’s survey about them, written independent of Mitchell and (of course) not mentioning his results. But in the months that followed, Mitchell traveled around and gave talks about his results. The reactions were mixed. Physicists wondered how the results related to physics. Mathematicians wondered about their status, given that they came from experimental mathematics, without any formal mathematical proof. And—as always—people found Mitchell’s explanations hard to understand.

In the fall of 1976, Predrag went as a postdoc to Oxford—and on the very first day that I showed up as 17-year-old particle-physics-paper-writing undergraduate, I ran into him. We talked mostly about his elegant “bird tracks” method for doing group theory (about which he finally published a book 32 years later). But he also tried to explain iterated maps. And I still remember him talking about an idealized model for fish populations in the Adriatic Sea (only years later did I make the connection that Predrag was from what is now Croatia).

At the time I didn’t pay much attention, but somehow the idea of iterated maps lodged in my consciousness, soon mixed together with the notion of fractals that I learned from Benoit Mandelbrot’s book. And when I began to concentrate on issues of complexity a couple of years later, these ideas helped guide me towards systems like cellular automata.

But back in 1976, Mitchell (who I wouldn’t meet for several more years) was off giving lots of talks about his results. He also submitted a paper to the prestigious academic journal Advances in Mathematics. For 6 months he heard nothing. But eventually the paper was rejected. He tried again with another paper, now sending it to the SIAM Journal of Applied Mathematics. Same result.

I have to say I’m not surprised this happened. In my own experience of academic publishing (now long in the past), if one was reporting progress within an established area it wasn’t too hard to get a paper published. But anything genuinely new or original one could pretty much count on getting rejected by the peer review process, either through intellectual shortsightedness or through academic corruption. And for Mitchell there was the additional problem that his explanations weren’t easy to understand.

But finally, in late 1977, Joel Lebowitz, editor of the Journal of Statistical Physics, agreed to publish Mitchell’s paper—essentially on the basis of knowing Mitchell, even though he admitted he didn’t really understand the paper. And so it was that early in 1978 “Quantitative Universality for a Class of Nonlinear Transformations”—reporting Mitchell’s big result—officially appeared. (For purposes of academic priority, Mitchell would sometimes quote a summary of a talk he gave on August 26, 1976, that was published in the Los Alamos Theoretical Division Annual Report 1975–1976. Mitchell was quite affected by the rejection of his papers, and for years kept the rejection letters in his desk drawer.)

Mitchell continued to travel the world talking about his results. There was interest, but also confusion. But in the summer of 1979, something exciting happened: Albert Libchaber in Paris reported results on a physical experiment on the transition to turbulence in convection in liquid helium—where he saw period doubling, with exactly the exponent δ that Mitchell had calculated. Mitchell’s δ apparently wasn’t just universal to a class of mathematical systems—it also showed up in real, physical systems.

Pretty much immediately, Mitchell was famous. Connections to the renormalization group had been made, and his work was becoming fashionable among both physicists and mathematicians. Mitchell himself was still traveling around, but now he was regularly hobnobbing with the top physicists and mathematicians.

I remember him coming to Caltech, perhaps in the fall of 1979. There was a certain rock-star character to the whole thing. Mitchell showed up, gave a stylish but somewhat mysterious talk, and was then whisked away to talk privately with Richard Feynman and Murray Gell-Mann.

Soon Mitchell was being offered all sorts of high-level jobs, and in 1982 he triumphantly returned to Cornell as a full professor of physics. There was an air of Nobel Prize–worthiness, and by June 1984 he was appearing in the New York Times magazine, in full Beethoven mode, in front of a Cornell waterfall:

Mitchell in New York Times Magazine

Still, the mathematicians weren’t satisfied. As with Benoit Mandelbrot’s work, they tended to see Mitchell’s results as mere “numerical conjectures”, not proven and not always even quite worth citing. But top mathematicians (who Mitchell had befriended) were soon working on the problem, and results began to appear—though it took a decade for there to be a full, final proof of the universality of δ.

Where the Science Went

So what happened to Mitchell’s big discovery? It was famous, for sure. And, yes, period-doubling cascades with his universal features were seen in a whole sequence of systems—in fluids, optics and more. But how general was it, really? And could it, for example, be extended to the full problem of fluid turbulence?

Mitchell and others studied systems other than iterated maps, and found some related phenomena. But none were quite as striking as Mitchell’s original discovery.

In a sense, my own efforts on cellular automata and the behavior of simple programs, beginning around 1981, have tried to address some of the same bigger questions as Mitchell’s work might have led to. But the methods and results have been very different. Mitchell always tried to stay close to the kinds of things that traditional mathematical physics can address, while I unabashedly struck out into the computational universe, investigating the phenomena that occur there.

I tried to see how Mitchell’s work might relate to mine—and even in my very first paper on cellular automata in 1981 I noted for example that the average density of black cells on successive steps of a cellular automaton’s evolution can be approximated (in “mean field theory”) by an iterated map.

I also noted that mathematically the whole evolution of a cellular automaton can be viewed as an iterated map—though on the Cantor set, rather than on ordinary real numbers. In my first paper, I even plotted the analog of Mitchell’s smooth mappings, but now they were wild and discontinuous:

Rules plot

     Table[FromDigits[CellularAutomaton[#, IntegerDigits[n, 2, 12]], 
       2], {n, 0, 2^12 - 1}], Sequence[
     AspectRatio -> 1, Frame -> True, FrameTicks -> None]], 
    Text[StringTemplate["rule ``"][#]]] & /@ {22, 42, 90, 110}]

But try as I might, I could never find any strong connection with Mitchell’s work. I looked for analogs of things like period doubling, and Sarkovskii’s theorem, but didn’t find much. In my computational framework, even thinking about real numbers, with their infinite sequence of digits, was a bit unnatural. Years later, in A New Kind of Science, I had a note entitled “Smooth iterated maps”. I showed their digit sequences, and observed, rather undramatically, that Mitchell’s discovery implied an unusual nested structure at the beginning of the sequences:


FractionalDigits[x_, digs_Integer] := 
 NestList[{Mod[2 First[#], 1], Floor[2 First[#]]} &, {x, 0}, digs][[
  2 ;;, -1]];
    FractionalDigits[#, 40] & /@ 
     NestList[a # (1 - #) &, N[1/8, 80], 80]]] /@ {2.5, 3.3, 3.4, 3.5,
    3.6, 4}]

The Rest of the Story

Portrait of Mitchell
(Photograph by Predrag Cvitanović)

So what became of Mitchell? After four years at Cornell, he moved to the Rockefeller University in New York, and for the next 30 years settled into a somewhat Bohemian existence, spending most of his time at his apartment on the Upper East Side of Manhattan.

While he was still at Los Alamos, Mitchell had married a woman from Germany named Cornelia, who was the sister of the wife of physicist (and longtime friend of mine) David Campbell, who had started the Center for Nonlinear Studies at Los Alamos, and would later go on to be provost at Boston University. But after not too long, Cornelia left Mitchell, taking up instead with none other than Pete Carruthers. (Pete—who struggled with alcoholism and other issues—later reunited with Lucy, but died in 1997 at the age of 61.)

When he was back at Cornell, Mitchell met a woman named Gunilla, who had run away from her life as a pastor’s daughter in a small town in northern Sweden at the age of 14, had ended up as a model for Salvador Dalí, and then in 1966 had been brought to New York as a fashion model. Gunilla had been a journalist, video maker, playwright and painter. Mitchell and she married in 1986, and remained married for 26 years, during which time Gunilla developed quite a career as a figurative painter.

Mitchell’s last solo academic paper was published in 1987. He did publish a handful of other papers with various collaborators, though none were terribly remarkable. Most were extensions of his earlier work, or attempts to apply traditional methods of mathematical physics to various complex fluid-like phenomena.

Mitchell liked interacting with the upper echelons of academia. He received all sorts of honors and recognition (though never a Nobel Prize). But to the end he viewed himself as something of an outsider—a Renaissance man who happened to have focused on physics, but didn’t really buy into all its institutions or practices.

From the early 1980s on, I used to see Mitchell fairly regularly, in New York or elsewhere. He became a daily user of Mathematica, singing its praises and often telling me about elaborate calculations he had done with it. Like many mathematical physicists, Mitchell was a connoisseur of special functions, and would regularly talk to me about more and more exotic functions he thought we should add.

Mitchell had two major excursions outside of academia. By the mid-1980s, the young poetess—now named Kathy Hammond—that Mitchell had known at Cornell had been an advertising manager for the New York Times and had then married into the family that owned the Hammond World Atlas. And through this connection, Mitchell was pulled into a completely new field for him: cartography.

I talked to him about it many times. He was very proud of figuring out how to use the Riemann mapping theorem to produce custom local projections for maps. He described (though I never fully understood it) a very physics-based algorithm for placing labels on maps. And he was very pleased when finally an entirely new edition of the Hammond World Atlas (that he would refer to as “my atlas”) came out.

Starting in the 1980s, there’d been an increasing trend for physics ideas to be applied to quantitative finance, and for physicists to become Wall Street quants. And with people in finance continually looking for a unique edge, there was always an interest in new methods. I was certainly contacted a lot about this—but with the success of James Gleick’s 1987 book Chaos (for which I did a long interview, though was only mentioned, misspelled, in a list of scientists who’d been helpful), there was a whole new set of people looking to see how “chaos” could help them in finance.

One of those was a certain Michael Goodkin. When he was in college back in the early 1960s, Goodkin had started a company that marketed the legal research services of law students. A few years later, he enlisted several Nobel Prize–winning economists and started what may have been the first hedge fund to do computerized arbitrage trading. Goodkin had always been a high-rolling, globetrotting gambler and backgammon player, and he made and lost a lot of money. And, down on his luck, he was looking for the next big thing—and found chaos theory, and Mitchell Feigenbaum.

For a few years he cultivated various physicists, then in 1995 he found a team to start a company called Numerix to commercialize the use of physics-like methods in computations for increasingly exotic financial instruments. Mitchell Feigenbaum was the marquee name, though the heavy lifting was mostly done by my longtime friend Nigel Goldenfeld, and a younger colleague of his named Sasha Sokol.

At the beginning there was lots of mathematical-physics-like work, and Mitchell was quite involved. (He was an enthusiast of Itô calculus, gave lectures about it, and was proud of having found 1000 speed-ups of stochastic integrations.) But what the company actually did was to write C++ libraries for banks to integrate into their systems. It wasn’t something Mitchell wanted to do long term. And after a number of years, Mitchell’s active involvement in the company declined.

(I’d met Michael Goodkin back in 1998, and 14 years later—having recently written his autobiography The Wrong Answer Faster: The Inside Story of Making the Machine That Trades Trillions—he suddenly contacted me again, pitching my involvement in a rather undefined new venture. Mitchell still spoke highly of Michael, though when the discussion rather bizarrely pivoted to me basically starting and CEOing a new company, I quickly dropped it.)

I had many interactions with Mitchell over the years, though they’re not as well archived as they might be, because they tended to be verbal rather than written, since, as Mitchell told me (in email): “I dislike corresponding by email. I still prefer to hear an actual voice and interact…”

There are fragments in my archive, though. There’s correspondence, for example, about Mitchell’s 2004 60th-birthday event, that I couldn’t attend because it conflicted with a significant birthday for one of my children. In lieu of attending, I commissioned the creation of a “Feigenbaum–Cvitanović Crystal”—a 3D rendering in glass of the limiting function g(z) in the complex plane.

It was a little complex to solve the functional equation, and the laser manufacturing method initially shattered a few blocks of glass, but eventually the object was duly made, and sent—and I was pleased many years later to see it nicely displayed in Mitchell’s apartment:

Feigenbaum–Cvitanović crystal

Sometimes my archives record mentions of Mitchell by others, usually Predrag. In 2007, Predrag reported (with characteristic wit):

“Other news: just saw Mitchell, he is dating Odyssey.

No, no, it’s not a high-level Washington type escort service—he is dating Homer’s Odyssey, by computing the positions of low stars as function of the 26000 year precession—says Hiparcus [sic] had it all figured out, but Catholic church succeeded in destroying every single copy of his tables.”

Living up to the Renaissance man tradition, Mitchell always had a serious interest in history. In 2013, responding to a piece of mine about Leibniz, Mitchell said he’d been a Leibniz enthusiast since he was a teenager, then explained:

“The Newton hagiographer (literally) Voltaire had no idea of the substance of the Monadology, so could only spoof ‘the best of all possible worlds’. Long ago I’ve published this as a verbal means of explaining 2^n universality.

Leibniz’s second published paper at age 19, ‘On the Method of Inverse Tangents’, or something like that, is actually the invention of the method of isoclines to solve ODEs, quite contrary to the extant scholarly commentary. Both Leibniz and Newton start with differential equations, already having received the diff. calculus. This is quite an intriguing story.”

But the mainstay of Mitchell’s intellectual life was always mathematical physics, though done more as a personal matter than as part of institutional academic work. At some point he was asked by his then-young goddaughter (he never had children of his own) why the Moon looks larger when it’s close to the horizon. He wrote back an explanation (a bit in the style of Euler’s Letters to a German Princess), then realized he wasn’t sure of the answer, and got launched into many years of investigation of optics and image formation. (He’d actually been generally interested in the retina since he was at MIT, influenced by Jerry Lettvin of “What the Frog’s Eye Tells the Frog’s Brain” fame.)

He would tell me about it, explaining that the usual theory of image formation was wrong, and he had a better one. He always used the size of the Moon as an example, but I was never quite clear whether the issue was one of optics or perception. He never published anything about what he did, though with luck his manuscripts (rumored to have the makings of a book) will eventually see the light of day—assuming others can understand them.

When I would visit Mitchell (and Gunilla), their apartment had a distinctly Bohemian feel, with books, papers, paintings and various devices strewn around. And then there was The Bird. It was a cockatoo, and it was loud. I’m not sure who got it or why. But it was a handful. Mitchell and Gunilla nearly got ejected from their apartment because of noise complaints from neighbors, and they ended up having to take The Bird to therapy. (As I learned in a slightly bizarre—and never executed—plan to make videogames for “they-are-alien-intelligences-right-here-on-this-planet” pets, cockatoos are social and, as pets, arguably really need a “Twitter for Cockatoos”.)

The Bird
(Photograph by Predrag Cvitanović)

In the end, though, it was Gunilla who left, with the rumor being that she’d been driven away by The Bird.

The last time I saw Mitchell in person was a few years ago. My son Christopher and I visited him at his apartment—and he was in full Mitchell form, with eyes bright, talking rapidly and just a little conspiratorially about the mathematical physics of image formation. “Bird eyes are overrated”, he said, even as his cockatoo squawked in the next room. “Eagles have very small foveas, you know. Their eyes are like telescopes.”

“Fish have the best eyes”, he said, explaining that all eyes evolved underwater—and that the architecture hadn’t really changed since. “Fish keep their entire field of view in focus, not like us”, he said. It was charming, eccentric, and very Mitchell.

For years, we had talked from time to time on the phone, usually late at night. I saw Predrag a few months ago, saying that I was surprised not to have heard from Mitchell. He explained that Mitchell was sick, but was being very private about it. Then, a few weeks ago, just after midnight, Predrag sent me an email with the subject line “Mitchell is dead”, explaining that Mitchell had died at around 8 pm, and attaching a quintessential Mitchell-in-New-York picture:

Mitchell in New York
(Photograph by Predrag Cvitanović)

It’s kind of a ritual I’ve developed when I hear that someone I know has died: I immediately search my archives. And this time I was surprised to find that a few years ago Mitchell had successfully reached voicemail I didn’t know I had. So now we can give Mitchell the last word:

Play Audio
Mitchell's voicemail

And, of course, the last number too: 4.66920160910299067185320382…

]]> 2
<![CDATA[Testifying at the Senate about <br />A.I.-Selected Content on the Internet</br>]]> Wed, 26 Jun 2019 03:05:12 +0000 Stephen Wolfram capitol-thumbAn Invitation to Washington Three and a half weeks ago I got an email asking me if I’d testify at a hearing of the US Senate Commerce Committee’s Subcommittee on Communications, Technology, Innovation and the Internet. Given that the title of the hearing was “Optimizing for Engagement: Understanding the Use of Persuasive Technology on Internet [...]]]> capitol-thumb

Optimizing for Engagement: Understanding the Use of Persuasive Technology on Internet Platforms

An Invitation to Washington

Three and a half weeks ago I got an email asking me if I’d testify at a hearing of the US Senate Commerce Committee’s Subcommittee on Communications, Technology, Innovation and the Internet. Given that the title of the hearing was “Optimizing for Engagement: Understanding the Use of Persuasive Technology on Internet Platforms” I wasn’t sure why I’d be relevant.

But then the email went on: “The hearing is intended to examine, among other things, whether algorithmic transparency or algorithmic explanation are policy options Congress should be considering.” That piqued my interest, because, yes, I have thought about “algorithmic transparency” and “algorithmic explanation”, and their implications for the deployment of artificial intelligence.

Generally I stay far away from anything to do with politics. But figuring out how the world should interact with AI is really important. So I decided that—even though it was logistically a bit difficult—I should do my civic duty and go to Washington and testify.

Understanding the Issues

So what was the hearing really about? For me, it was in large measure an early example of reaction to the realization that, yes, AIs are starting to run the world. Billions of people are being fed content that is basically selected for them by AIs, and there are mounting concerns about this, as reported almost every day in the media.

Are the AIs cleverly hacking us humans to get us to behave in a certain way? What kind of biases do the AIs have, relative to what the world is like, or what we think the world should be like? What are the AIs optimizing for, anyway? And when are there actually “humans behind the curtain”, controlling in detail what the AIs are doing?

It doesn’t help that in some sense the AIs are getting much more free rein than they might because the people who use them aren’t really their customers. I have to say that back when the internet was young, I personally never thought it would work this way, but in today’s world many of the most successful businesses on the internet—including Google, Facebook, YouTube and Twitter—make their revenue not from their users, but instead from advertisers who are going through them to reach their users.

All these business also have in common that they are fundamentally what one can call “automated content selection businesses”: they work by getting large amounts of content that they didn’t themselves generate, then using what amounts to AI to automatically select what content to deliver or to suggest to any particular user at any given time—based on data that they’ve captured about that user. Part of what’s happening is presumably optimized to give a good experience to their users (whatever that might mean), but part of it is also optimized to get revenue from the actual customers, i.e. advertisers. And there’s also an increasing suspicion that somehow the AI is biased in what it’s doing—maybe because someone explicitly made it be, or because it somehow evolved that way.

“Open Up the AI”?

So why not just “open up the AI” and see what it’s doing inside? Well, that’s what the algorithmic transparency idea mentioned in the invitation to the hearing is about.

And the problem is that, no, that can’t work. If we want to seriously use the power of computation—and AI—then inevitably there won’t be a “human-explainable” story about what’s happening inside.

So, OK, if you can’t check what’s happening inside the AI, what about putting constraints on what the AI does? Well, to do that, you have to say what you want. What rule for balance between opposing kinds of views do you want? How much do you allow people to be unsettled by what they see? And so on.

And there are two problems here: first, what to want, and, second, how to describe it. In the past, the only way we could imagine describing things like this was with traditional legal rules, written in legalese. But if we want AIs to automatically follow these rules, perhaps billions of times a second, that’s not good enough: instead, we need something that AIs can intrinsically understand.

And at least on this point I think we’re making good progress. Because—thanks to our 30+ years of work on Wolfram Language—we’re now beginning to have a computational language that has the scope to formulate “computational contracts” that can specify relevant kinds of constraints in computational terms, in a form that humans can write and understand, and that machines can automatically interpret.

But even though we’re beginning to have the tools, there’s still the huge question of what the “computational laws” for automatic content selection AIs will be.

A lot of the hearing ultimately revolved around Section 230 of the 1996 Communications Decency Act—which specifies what kinds of content companies can choose to block without losing their status as “neutral platforms”. There’s a list of fairly uncontroversially blockable kinds of content. But then the sentence ends with “or otherwise objectionable [content]”. What does this mean? Does it mean content that espouses objectionable points of view? Whose definition of “objectionable”? Etc.

Well, one day things like Section 230 will, of necessity, not be legalese laws, but computational laws. There’ll be some piece of computational language that specifies for example that this-or-that machine learning classifier trained on this-or-that sample of the internet will be used to define this or that.

We’re not there yet, however. We’re only just beginning to be able to set up computational contracts for much simpler things, like business situations. And—somewhat fueled by blockchain—I expect that this will accelerate in the years to come. But it’s going to be a while before the US Senate is routinely debating lines of code in computational laws.

So, OK, what can be done now?

A Possible Path Forward?

A little more than a week ago, what I’d figured out was basically what I’ve already described here. But that meant I was looking at going to the hearing and basically saying only negative things. “Sorry, this won’t work. You can’t do that. The science says it’s impossible. The solution is years in the future.” Etc.

And, as someone who prides himself on turning the seemingly impossible into the possible, this didn’t sit well with me. So I decided I’d better try to figure out if I could actually see a pragmatic, near-term path forward. At first, I tried thinking about purely technological solutions. But soon I basically convinced myself that no such solution was going to work.

So, with some reticence, I decided I’d better start thinking about other kinds of solutions. Fortunately there are quite a few people at my company and in my circle who I could talk to about this—although I soon discovered they often had strongly conflicting views. But after a little while, a glimmer of an idea emerged.

Why does every aspect of automated content selection have to be done by a single business? Why not open up the pipeline, and create a market in which users can make choices for themselves?

One of the constraints I imposed on myself is that my solution couldn’t detract from the impressive engineering and monetization of current automated content selection businesses. But I came up with at least two potential ways to open things up that I think could still perfectly well satisfy this constraint.

One of my ideas involved introducing what I call “final ranking providers”: third parties who take pre-digested feature vectors from the underlying content platform, then use these to do the final ranking of items in whatever way they want. My other ideas involved introducing “constraint providers”: third parties who provide constraints in the form of computational contracts that are inserted into the machine learning loop of the automated content selection system.

The important feature of both these solutions is that users don’t have to trust the single AI of the automated content selection business. They can in effect pick their own brand of AI—provided by a third party they trust—to determine what content they’ll actually be given.

Who would these third-party providers be? They might be existing media organizations, or nonprofits, or startups. Or they might be something completely new. They’d have to have some technical sophistication. But fundamentally what they’d have to do is to define—or represent—brands that users would trust to decide what the final list of items in their news feed, or video recommendations, or search results, or whatever, might be.

Social networks get their usefulness by being monolithic: by having “everyone” connected into them. But the point is that the network can prosper as a monolithic thing, but there doesn’t need to be just one monolithic AI that selects content for all the users on the network. Instead, there can be a whole market of AIs, that users can freely pick between.

And here’s another important thing: right now there’s no consistent market pressure on the final details of how content is selected for users, not least because users aren’t the final customers. (Indeed, pretty much the only pressure right now comes from PR eruptions and incidents.) But if the ecosystem changes, and there are third parties whose sole purpose is to serve users, and to deliver the final content they want, then there’ll start to be real market forces that drive innovation—and potentially add more value.

Could It Work?

AI provides powerful ways to automate the doing of things. But AIs on their own can’t ultimately decide what they want to do. That has to come from outside—from humans defining goals. But at a practical level, where should those goals be set? Should they just come—monolithically—from an automated content selection business? Or should users have more freedom, and more choice?

One might say: “Why not let every user set everything for themselves?”. Well, the problem with that is that automated content selection is a complicated matter. And—much as I hope that there’ll soon be very widespread computational language literacy—I don’t think it’s realistic that everyone will be able to set up everything in detail for themselves. So instead, I think the better idea is to have discrete third-party providers, who set things up in a way that appeals to some particular group of users.

Then standard market forces can come into play. No doubt the result would even be a greater level of overall success for the delivery of content to users who want it (and monetize it). But this market approach also solves some other problems associated with the “single point of failure” monolithic AI.

For example, with the monolithic AI, if someone figures out how to spread some kind of bad content, it’ll spread everywhere. With third-party providers, there’s a good chance it’ll only spread through some of them.

Right now there’s lots of unhappiness about people simply being “banned” from particular content platforms. But with the market of third-party providers, banning is not an all-or-nothing proposition anymore: some providers could ban someone, but others might not.

OK, but are there “fatal flaws” with my idea? People could object that it’s technically difficult to do. I don’t know the state of the codebases inside the major automated content selection businesses. But I’m certain that with manageable effort, appropriate APIs etc. could be set up. (And it might even help these businesses by forcing some code cleanup and modernization.)

Another issue might be: how will the third-party providers be incentivized? I can imagine some organizations just being third-party providers as a public service. But in other cases they’d have to be paid a commission by the underlying content platform. The theory, though, is that good work by third-party content providers would expand the whole market, and make them “worth their commission”. Plus, of course, the underlying content platforms could save a lot by not having to deal with all those complaints and issues they’re currently getting.

What if there’s a third-party provider that upranks content some people don’t like? That will undoubtedly happen. But the point is that this is a market—so market dynamics can operate.

Another objection is that my idea makes even worse the tendency with modern technology for people to live inside “content bubbles” where they never broaden their points of view. Well, of course, there can be providers who offer broader content. But people could choose “content bubbles” providers. The good thing, though, is that they’re choosing them, and they know they’re doing that, just like they know they’re choosing to watch one television channel and not another.

Of course it’s important for the operation of society that people have some level of shared values. But what should those shared values be, and who should decide them? In a totalitarian system, it’s basically the government. Right now, with the current monolithic state of automated content selection, one could argue it’s the automated content selection businesses.

If I were running one of those businesses, I’d certainly not want to get set up as the moral arbiter for the world; it seems like a no-win role. With the third-party providers idea, there’s a way out, without damaging the viability of the business. Yes, users get more control, as arguably they should have, given that they are the fuel that makes the business work. But the core business model is still perfectly intact. And there’s a new market that opens up, for third-party providers, potentially delivering all sorts of new economic value.

What Should I Do?

At the beginning of last weekend, what I just described was basically the state of my thinking. But what should I do with it? Was there some issue I hadn’t noticed? Was I falling into some political or business trap? I wasn’t sure. But it seemed as if some idea in this area was needed, and I had an idea, so I really should tell people about it.

So I quickly wrote up the written testimony for the hearing, and sent it in by the deadline on Sunday morning. (The full text of the testimony is included at the end of this piece.)

Stephen Wolfram's written testimony

The Hearing Itself

View of the Senate

This morning was the hearing itself. It was in the same room as the hearing Mark Zuckerberg did last fall. The staffers were saying that they expected a good turnout of senators, and that of the 24 senators on the subcommittee (out of 100 total in the Senate), they expected about 15 to show up at some point or another.

At the beginning, staffers were putting out nameplates for the senators. I was trying to figure out what the arrangement was. And then I realized! It was a horseshoe configuration and Republican senators were on the right side of the horseshoe, Democrats were on the left. There really are right and left wings! (Yes, I obviously don’t watch C-SPAN enough, or I’d already know that.)

When the four of us on the panel were getting situated, one of the senators (Marsha Blackburn [R-TN]) wandered up, and started talking about computational irreducibility. Wow, I thought, this is going to be interesting. That’s a pretty abstruse science concept to be finding its way into the Senate.

Everyone had five minutes to give opening remarks, and everyone had a little countdown timer in front of them. I talked a bit about the science and technology of AI and explainability. I mentioned computational contracts and the concept of an AI Constitution. Then I said I didn’t want to just explain that everything was impossible—and gave a brief summary of my ideas for solutions. Rather uncharacteristically for me, I ended a full minute before my time was up.

The format for statements and questions was five minutes per senator. The issues raised were quite diverse. I quickly realized, though, that it was unfortunate that I really had three different things I was talking about (non-explainability, computational laws, and my ideas for a near-term solution). In retrospect perhaps I should have concentrated on the near-term solution, but it felt odd to be emphasizing something I just thought of last week, rather than something I’ve thought about for many years.

Still, it was fascinating—and a sign of things to come—to see serious issues about what amounts to the philosophy of computation being discussed in the Senate. To be fair, I had done a small hearing at the Senate back in 2003 (my only other such experience) about the ideas in A New Kind of Science. But then it had been very much on the “science track”; now the whole discussion was decidedly mainstream.

I couldn’t help thinking that I was witnessing the concept of computation beginning to come of age. What used to be esoteric issues in the theory of computation were now starting to be things that senators were discussing writing laws about. One of the senators mentioned atomic energy, and compared it to AI. But really, AI is going to be something much more central to the whole future of our species.

It enables us to do so much. But yet it forces us to confront what we want to do, and who we want to be. Today it’s rare and exotic for the Senate to be discussing issues of AI. In time I suspect AI and its many consequences will be a dominant theme in many Senate discussions. This is just the beginning.

I wish we were ready to really start creating an AI Constitution. But we’re not (and it doesn’t help that we don’t have an AI analog of the few thousand years of human political history that were available as a guide when the US Constitution was drafted). Still, issue by issue I suspect we’ll move closer to the point where having a coherent AI Constitution becomes a necessity. No doubt there’ll be different ones in different communities and different countries. But one day a group like the one I saw today—with all the diverse and sometimes colorful characters involved—will end up having to figure out just how we humans interact with AI and the computational world.

The Written Testimony


Automated content selection by internet businesses has become progressively more contentious—leading to calls to make it more transparent or constrained. I explain some of the complex intellectual and scientific problems involved, then offer two possible technical and market suggestions for paths forward. Both are based on giving users a choice about who to trust for the final content they see—in one case introducing what I call “final ranking providers”, and in the other case what I call “constraint providers”.

The Nature of the Problem

There are many kinds of businesses that operate on the internet, but some of the largest and most successful are what one can call automated content selection businesses. Facebook, Twitter, YouTube and Google are all examples. All of them deliver content that others have created, but a key part of their value is associated with their ability to (largely) automatically select what content they should serve to a given user at a given time—whether in news feeds, recommendations, web search results, or advertisements.

What criteria are used to determine content selection? Part of the story is certainly to provide good service to users. But the paying customers for these businesses are not the users, but advertisers, and necessarily a key objective of these businesses must be to maximize advertising income. Increasingly, there are concerns that this objective may have unacceptable consequences in terms of content selection for users. And in addition there are concerns that—through their content selection—the companies involved may be exerting unreasonable influence in other kinds of business (such as news delivery), or in areas such as politics.

Methods for content selection—using machine learning, artificial intelligence, etc.—have become increasingly sophisticated in recent years. A significant part of their effectiveness—and economic success—comes from their ability to use extensive data about users and their previous activities. But there has been increasing dissatisfaction and, in some cases, suspicion about just what is going on inside the content selection process.

This has led to a desire to make content selection more transparent, and perhaps to constrain aspects of how it works. As I will explain, these are not easy things to achieve in a useful way. And in fact, they run into deep intellectual and scientific issues, that are in some ways a foretaste of problems we will encounter ever more broadly as artificial intelligence becomes more central to the things we do. Satisfactory ultimate solutions will be difficult to develop, but I will suggest here two near-term practical approaches that I believe significantly address current concerns.

How Automated Content Selection Works

Whether one’s dealing with videos, posts, webpages, news items or, for that matter, ads, the underlying problem of automated content selection (ACS) is basically always the same. There are many content items available (perhaps even billions of them), and somehow one has to quickly decide which ones are “best” to show to a given user at a given time. There’s no fundamental principle to say what “best” means, but operationally it’s usually in the end defined in terms of what maximizes user clicks, or revenue from clicks.

The major innovation that has made modern ACS systems possible is the idea of automatically extrapolating from large numbers of examples. The techniques have evolved, but the basic idea is to effectively deduce a model of the examples and then to use this model to make predictions, for example about what ranking of items will be best for a given user.

Because it will be relevant for the suggestions I’m going to make later, let me explain here a little more about how most current ACS systems work in practice. The starting point is normally to extract a collection of perhaps hundreds or thousands of features (or “signals”) for each item. If a human were doing it, they might use features like: “How long is the video? Is it entertainment or education? Is it happy or sad?” But these days—with the volume of data that’s involved—it’s a machine doing it, and often it’s also a machine figuring out what features to extract. Typically the machine will optimize for features that make its ultimate task easiest—whether or not (and it’s almost always not) there’s a human-understandable interpretation of what the features represent.

As an example, here are the letters of the alphabet automatically laid out by a machine in a “feature space” in which letters that “look similar” appear nearby:

Feature space plot

How does the machine know what features to extract to determine whether things will “look similar”? A typical approach is to give it millions of images that have been tagged with what they are of (“elephant”, “teacup”, etc.). And then from seeing which images are tagged the same (even though in detail they look different), the machine is able—using the methods of modern machine learning—to identify features that could be used to determine how similar images of anything should be considered to be.

OK, so let’s imagine that instead of letters of the alphabet laid out in a 2D feature space, we’ve got a million videos laid out in a 200-dimensional feature space. If we’ve got the features right, then videos that are somehow similar should be nearby in this feature space.

But given a particular person, what videos are they likely to want to watch? Well, we can do the same kind of thing with people as with videos: we can take the data we know about each person, and extract some set of features. “Similar people” would then be nearby in “people feature space”, and so on.

But now there’s a “final ranking” problem. Given features of videos, and features of people, which videos should be ranked “best” for which people? Often in practice, there’s an initial coarse ranking. But then, as soon as we have a specific definition of “best”—or enough examples of what we mean by “best”—we can use machine learning to learn a program that will look at the features of videos and people, and will effectively see how to use them to optimize the final ranking.

The setup is a bit different in different cases, and there are many details, most of which are proprietary to particular companies. However, modern ACS systems—dealing as they do with immense amounts of data at very high speed—are a triumph of engineering, and an outstanding example of the power of artificial intelligence techniques.

Is It “Just an Algorithm”?

When one hears the term “algorithm” one tends to think of a procedure that will operate in a precise and logical way, always giving a correct answer, not influenced by human input. One also tends to think of something that consists of well-defined steps, that a human could, if needed, readily trace through.

But this is pretty far from how modern ACS systems work. They don’t deal with the same kind of precise questions (“What video should I watch next?” just isn’t something with a precise, well-defined answer). And the actual methods involved make fundamental use of machine learning, which doesn’t have the kind of well-defined structure or explainable step-by-step character that’s associated with what people traditionally think of as an “algorithm”. There’s another thing too: while traditional algorithms tend to be small and self-contained, machine learning inevitably requires large amounts of externally supplied data.

In the past, computer programs were almost exclusively written directly by humans (with some notable exceptions in my own scientific work). But the key idea of machine learning is instead to create programs automatically, by “learning the program” from large numbers of examples. The most common type of program on which to apply machine learning is a so-called neural network. Although originally inspired by the brain, neural networks are purely computational constructs that are typically defined by large arrays of numbers called weights.

Imagine you’re trying to build a program that recognizes pictures of cats versus dogs. You start with lots of specific pictures that have been identified—normally by humans—as being either of cats or dogs. Then you “train” a neural network by showing it these pictures and gradually adjusting its weights to make it give the correct identification for these pictures. But then the crucial point is that the neural network generalizes. Feed it another picture of a cat, and even if it’s never seen that picture before, it’ll still (almost certainly) say it’s a cat.

What will it do if you feed it a picture of a cat dressed as a dog? It’s not clear what the answer is supposed to be. But the neural network will still confidently give some result—that’s derived in some way from the training data it was given.

So in a case like this, how would one tell why the neural network did what it did? Well, it’s difficult. All those weights inside the network were learned automatically; no human explicitly set them up. It’s very much like the case of extracting features from images of letters above. One can use these features to tell which letters are similar, but there’s no “human explanation” (like “count the number of loops in the letter”) of what each of the features are.

Would it be possible to make an explainable cat vs. dog program? For 50 years most people thought that a problem like cat vs. dog just wasn’t the kind of thing computers would be able to do. But modern machine learning made it possible—by learning the program rather than having humans explicitly write it. And there are fundamental reasons to expect that there can’t in general be an explainable version—and that if one’s going to do the level of automated content selection that people have become used to, then one cannot expect it to be broadly explainable.

Sometimes one hears it said that automated content selection is just “being done by an algorithm”, with the implication that it’s somehow fair and unbiased, and not subject to human manipulation. As I’ve explained, what’s actually being used are machine learning methods that aren’t like traditional precise algorithms.

And a crucial point about machine learning methods is that by their nature they’re based on learning from examples. And inevitably the results they give depend on what examples were used.

And this is where things get tricky. Imagine we’re training the cat vs. dog program. But let’s say that, for whatever reason, among our examples there are spotted dogs but no spotted cats. What will the program do if it’s shown a spotted cat? It might successfully recognize the shape of the cat, but quite likely it will conclude—based on the spots—that it must be seeing a dog.

So is there any way to guarantee that there are no problems like this, that were introduced either knowingly or unknowingly? Ultimately the answer is no—because one can’t know everything about the world. Is the lack of spotted cats in the training set an error, or are there simply no spotted cats in the world?

One can do one’s best to find correct and complete training data. But one will never be able to prove that one has succeeded.

But let’s say that we want to ensure some property of our results. In almost all cases, that’ll be perfectly possible—either by modifying the training set, or the neural network. For example, if we want to make sure that spotted cats aren’t left out, we can just insist, say, that our training set has an equal number of spotted and unspotted cats. That might not be a correct representation of what’s actually true in the world, but we can still choose to train our neural network on that basis.

As a different example, let’s say we’re selecting pictures of pets. How many cats should be there, versus dogs? Should we base it on the number of cat vs. dog images on the web? Or how often people search for cats vs. dogs? Or how many cats and dogs are registered in America? There’s no ultimate “right answer”. But if we want to, we can give a constraint that says what should happen.

This isn’t really an “algorithm” in the traditional sense either—not least because it’s not about abstract things; it’s about real things in the world, like cats and dogs. But an important development (that I happen to have been personally much involved in for 30+ years) is the construction of a computational language that lets one talk about things in the world in a precise way that can immediately be run on a computer.

In the past, things like legal contracts had to be written in English (or “legalese”). Somewhat inspired by blockchain smart contracts, we are now getting to the point where we can write automatically executable computational contracts not in human language but in computational language. And if we want to define constraints on the training sets or results of automated content selection, this is how we can do it.

Issues from Basic Science

Why is it difficult to find solutions to problems associated with automated content selection? In addition to all the business, societal and political issues, there are also some deep issues of basic science involved. Here’s a list of some of those issues. The precursors of these issues date back nearly a century, though it’s only quite recently (in part through my own work) that they’ve become clarified. And although they’re not enunciated (or named) as I have here, I don’t believe any of them are at this point controversial—though to come to terms with them requires a significant shift in intuition from what exists without modern computational thinking.

Data Deducibility

Even if you don’t explicitly know something (say about someone), it can almost always be statistically deduced if there’s enough other related data available

What is a particular person’s gender identity, ethnicity, political persuasion, etc.? Even if one’s not allowed to explicitly ask these questions, it’s basically inevitable that with enough other data about the person, one will be able to deduce what the best answers must be.

Everyone is different in detail. But the point is that there are enough commonalities and correlations between people that it’s basically inevitable that with enough data, one can figure out almost any attribute of a person.

The basic mathematical methods for doing this were already known from classical statistics. But what’s made this now a reality is the availability of vastly more data about people in digital form—as well as the ability of modern machine learning to readily work not just with numerical data, but also with things like textual and image data.

What is the consequence of ubiquitous data deducibility? It means that it’s not useful to block particular pieces of data—say in an attempt to avoid bias—because it’ll essentially always be possible to deduce what that blocked data was. And it’s not just that this can be done intentionally; inside a machine learning system, it’ll often just happen automatically and invisibly.

Computational Irreducibility

Even given every detail of a program, it can be arbitrarily hard to predict what it will
or won’t do

One might think that if one had the complete code for a program, one would readily be able to deduce everything about what the program would do. But it’s a fundamental fact that in general one can’t do this. Given a particular input, one can always just run the program and see what it does. But even if the program is simple, its behavior may be very complicated, and computational irreducibility implies that there won’t be a way to “jump ahead” and immediately find out what the program will do, without explicitly running it.

One consequence of this is that if one wants to know, for example, whether with any input a program can do such-and-such, then there may be no finite way to determine this—because one might have to check an infinite number of possible inputs. As a practical matter, this is why bugs in programs can be so hard to detect. But as a matter of principle, it means that it can ultimately be impossible to completely verify that a program is “correct”, or has some specific property.

Software engineering has in the past often tried to constrain the programs it deals with so as to minimize such effects. But with methods like machine learning, this is basically impossible to do. And the result is that even if it had a complete automated content selection program, one wouldn’t in general be able to verify that, for example, it could never show some particular bad behavior.


For a well-optimized computation, there’s not likely to be a human-understandable narrative about how it works inside

Should we expect to understand how our technological systems work inside? When things like donkeys were routinely part of such systems, people didn’t expect to. But once the systems began to be “completely engineered” with cogs and levers and so on, there developed an assumption that at least in principle one could explain what was going on inside. The same was true with at least simpler software systems. But with things like machine learning systems, it absolutely isn’t.

Yes, one can in principle trace what happens to every bit of data in the program. But can one create a human-understandable narrative about it? It’s a bit like imagining we could trace the firing of every neuron in a person’s brain. We might be able to predict what a person would do in a particular case, but it’s a different thing to get a high-level “psychological narrative” about why they did it.

Inside a machine learning system—say the cats vs. dogs program—one can think of it as extracting all sorts of features, and making all sorts of distinctions. And occasionally one of these features or distinctions might be something we have a word for (“pointedness”, say). But most of the time they’ll be things the machine learning system discovered, and they won’t have any connection to concepts we’re familiar with.

And in fact—as a consequence of computational irreducibility—it’s basically inevitable that with things like the finiteness of human language and human knowledge, in any well-optimized computation we’re not going to be able to give a high-level narrative to explain what it’s doing. And the result of this is that it’s impossible to expect any useful form of general “explainability” for automated content selection systems.

Ethical Incompleteness

There’s no finite set of principles that can completely define any reasonable, practical system of ethics

Let’s say one’s trying to teach ethics to a computer, or an artificial intelligence. Is there some simple set of principles—like Asimov’s Laws of Robotics—that will capture a viable complete system of ethics? Looking at the complexity of human systems of laws one might suspect that the answer is no. And in fact this is presumably a fundamental result—essentially another consequence of computational irreducibility.

Imagine that we’re trying to define constraints (or “laws”) for an artificial intelligence, in order to ensure that the AI behaves in some particular “globally ethical” way. We set up a few constraints, and we find that many things the AI does follow our ethics. But computational irreducibility essentially guarantees that eventually there’ll always be something unexpected that’s possible. And the only way to deal with that is to add a “patch”—essentially to introduce another constraint for that new case. And the issue is that this will never end: there’ll be no way to give a finite set of constraints that will achieve our global objectives. (There’s a somewhat technical analogy of this in mathematics, in which Gödel’s theorem shows that no finite set of axiomatic constraints can give one only ordinary integers and nothing else.)

So for our purposes here, the main consequence of this is that we can’t expect to have some finite set of computational principles (or, for that matter, laws) that will constrain automated content selection systems to always behave according to some reasonable, global system of ethics—because they’ll always be generating unexpected new cases that we have to define a new principle to handle.

The Path Forward

I’ve described some of the complexities of handling issues with automated content selection systems. But what in practice can be done?

One obvious idea would be just to somehow “look inside” the systems, auditing their internal operation and examining their construction. But for both fundamental and practical reasons, I don’t think this can usefully be done. As I’ve discussed, to achieve the kind of functionality that users have become accustomed to, modern automated content selection systems make use of methods such as machine learning that are not amenable to human-level explainability or systematic predictability.

What about checking whether a system is, for example, biased in some way? Again, this is a fundamentally difficult thing to determine. Given a particular definition of bias, one could look at the internal training data used for the system—but this won’t usually give more information than just studying how the system behaves.

What about seeing if the system has somehow intentionally been made to do this or that? It’s conceivable that the source code could have explicit “if” statements that would reveal intention. But the bulk of the system will tend to consist of trained neural networks and so on—and as in most other complex systems, it’ll typically be impossible to tell what features might have been inserted “on purpose” and what are just accidental or emergent properties.

So if it’s not going to work to “look inside” the system, what about restricting how the system can be set up? For example, one approach that’s been suggested is to limit the inputs that the system can have, in an extreme case preventing it from getting any personal information about the user and their history. The problem with this is that it negates what’s been achieved over the course of many years in content selection systems—both in terms of user experience and economic success. And for example, knowing nothing about a user, if one has to recommend a video, one’s just going to have to suggest whatever video is generically most popular—which is very unlikely to be what most users want most of the time.

As a variant of the idea of blocking all personal information, one can imagine blocking just some information—or, say, allowing a third party to broker what information is provided. But if one wants to get the advantages of modern content selection methods, one’s going to have to leave a significant amount of information—and then there’s no point in blocking anything, because it’ll almost certainly be reproducible through the phenomenon of data deducibility.

Here’s another approach: what about just defining rules (in the form of computational contracts) that specify constraints on the results content selection systems can produce? One day, we’re going to have to have such computational contracts to define what we want AIs in general to do. And because of ethical incompleteness—like with human laws—we’re going to have to have an expanding collection of such contracts.

But even though (particularly through my own efforts) we’re beginning to have the kind of computational language necessary to specify a broad range of computational contracts, we realistically have to get much more experience with computational contracts in standard business and other situations before it makes sense to try setting them up for something as complex as global constraints on content selection systems.

So, what can we do? I’ve not been able to see a viable, purely technical solution. But I have formulated two possible suggestions based on mixing technical ideas with what amount to market mechanisms.

The basic principle of both suggestions is to give users a choice about who to trust, and to let the final results they see not necessarily be completely determined by the underlying ACS business.

There’s been debate about whether ACS businesses are operating as “platforms” that more or less blindly deliver content, or whether they’re operating as “publishers” who take responsibility for content they deliver. Part of this debate can be seen as being about what responsibility should be taken for an AI. But my suggestions sidestep this issue, and in different ways tease apart the “platform” and “publisher” roles.

It’s worth saying that the whole content platform infrastructure that’s been built by the large ACS businesses is an impressive and very valuable piece of engineering—managing huge amounts of content, efficiently delivering ads against it, and so on. What’s really at issue is whether the fine details of the ACS systems need to be handled by the same businesses, or whether they can be opened up. (This is relevant only for ACS businesses whose network effects have allowed them to serve a large fraction of a population. Small ACS businesses don’t have the same kind of lock-in.)

Suggestion A: Allow Users to Choose among Final Ranking Providers

Suggestion A

As I discussed earlier, the rough (and oversimplified) outline of how a typical ACS system works is that first features are extracted for each content item and each user. Then, based on these features, there’s a final ranking done that determines what will actually be shown to the user, in what order, etc.

What I’m suggesting is that this final ranking doesn’t have to be done by the same entity that sets up the infrastructure and extracts the features. Instead, there could be a single content platform but a variety of “final ranking providers”, who take the features, and then use their own programs to actually deliver a final ranking.

Different final ranking providers might use different methods, and emphasize different kinds of content. But the point is to let users be free to choose among different providers. Some users might prefer (or trust more) some particular provider—that might or might not be associated with some existing brand. Other users might prefer another provider, or choose to see results from multiple providers.

How technically would all this be implemented? The underlying content platform (presumably associated with an existing ACS business) would take on the large-scale information-handling task of deriving extracted features. The content platform would provide sufficient examples of underlying content (and user information) and its extracted features to allow the final ranking provider’s systems to “learn the meaning” of the features.

When the system is running, the content platform would in real time deliver extracted features to the final ranking provider, which would then feed this into whatever system they have developed (which could use whatever automated or human selection methods they choose). This system would generate a ranking of content items, which would then be fed back to the content platform for final display to the user.

To avoid revealing private user information to lots of different providers, the final ranking provider’s system should probably run on the content platform’s infrastructure. The content platform would be responsible for the overall user experience, presumably providing some kind of selector to pick among final ranking providers. The content platform would also be responsible for delivering ads against the selected content.

Presumably the content platform would give a commission to the final ranking provider. If properly set up, competition among final ranking providers could actually increase total revenue to the whole ACS business, by achieving automated content selection that serves users and advertisers better.

Suggestion B: Allow Users to Choose among Constraint Providers

Suggestion B

One feature of Suggestion A is that it breaks up ACS businesses into a content platform component, and a final ranking component. (One could still imagine, however, that a quasi-independent part of an ACS business could be one of the competing final ranking providers.) An alternative suggestion is to keep ACS businesses intact, but to put constraints on the results that they generate, for example forcing certain kinds of balance, etc.

Much like final ranking providers, there would be constraint providers who define sets of constraints. For example, a constraint provider could require that there be on average an equal number of items delivered to a user that are classified (say, by a particular machine learning system) as politically left-leaning or politically right-leaning.

Constraint providers would effectively define computational contracts about properties they want results delivered to users to have. Different constraint providers would define different computational contracts. Some might want balance; others might want to promote particular types of content, and so on. But the idea is that users could decide what constraint provider they wish to use.

How would constraint providers interact with ACS businesses? It’s more complicated than for final ranking providers in Suggestion A, because effectively the constraints from constraint providers have to be woven deeply into the basic operation of the ACS system.

One possible approach is to use the machine learning character of ACS systems, and to insert the constraints as part of the “learning objectives” (or, technically, “loss functions”) for the system. Of course, there could be constraints that just can’t be successfully learned (for example, they might call for types of content that simply don’t exist). But there will be a wide range of acceptable constraints, and in effect, for each one, a different ACS system would be built.

All these ACS systems would then be operated by the underlying ACS business, with users selecting which constraint provider—and therefore which overall ACS system—they want to use.

As with Suggestion A, the underlying ACS business would be responsible for delivering advertising, and would pay a commission to the constraint provider.

Although their detailed mechanisms are different, both Suggestions A and B attempt to leverage the exceptional engineering and commercial achievements of the ACS businesses, while diffusing current trust issues about content selection, providing greater freedom for users, and inserting new opportunities for market growth.

The suggestions also help with some other issues. One example is the banning of content providers. At present, with ACS businesses feeling responsible for content on their platforms, there is considerable pressure, not least from within the ACS businesses themselves, to ban content providers that they feel are providing inappropriate content. The suggestions diffuse the responsibility for content, potentially allowing the underlying ACS businesses not to ban anything but explicitly illegal content.

It would then be up to the final ranking providers, or the constraint providers, to choose whether or not to deliver or allow content of a particular character, or from a particular content provider. In any given case, some might deliver or allow it, and some might not, removing the difficult all-or-none nature of the banning that’s currently done by ACS businesses.

One feature of my suggestions is that they allow fragmentation of users into groups with different preferences. At present, all users of a particular ACS business have content that is basically selected in the same way. With my suggestions, users of different persuasions could potentially receive completely different content, selected in different ways.

While fragmentation like this appears to be an almost universal tendency in human society, some might argue that having people routinely be exposed to other people’s points of view is important for the cohesiveness of society. And technically some version of this would not be difficult to achieve. For example, one could take the final ranking or constraint providers, and effectively generate a feature space plot of what they do.

Some would be clustered close together, because they lead to similar results. Others would be far apart in feature space—in effect representing very different points of view. Then if someone wanted to, say, see their typical content 80% of the time, but see different points of view 20% of the time, the system could combine different providers from different parts of feature space with a certain probability.

Of course, in all these matters, the full technical story is much more complex. But I am confident that if they are considered desirable, either of the suggestions I have made can be implemented in practice. (Suggestion A is likely to be somewhat easier to implement than Suggestion B.) The result, I believe, will be richer, more trusted, and even more widely used automated content selection. In effect both my suggestions mix the capabilities of humans and AIs—to help get the best of both of them—and to navigate through the complex practical and fundamental problems with the use of automated content selection.

]]> 4
<![CDATA[My Part in an Origin Story: <br />The Launching of the Santa Fe Institute</br>]]> Tue, 18 Jun 2019 19:36:02 +0000 Stephen Wolfram Launching the Santa Fe InstituteThe first workshop to define what is now the Santa Fe Institute took place on October 5–6, 1984. I was recently asked to give some reminiscences of the event, for a republication of a collection of papers derived from this and subsequent workshops. It was a slightly dark room, decorated with Native American artifacts. Around [...]]]> Launching the Santa Fe Institute

The first workshop to define what is now the Santa Fe Institute took place on October 5–6, 1984. I was recently asked to give some reminiscences of the event, for a republication of a collection of papers derived from this and subsequent workshops.

It was a slightly dark room, decorated with Native American artifacts. Around it were tables arranged in a large rectangle, at which sat a couple dozen men (yes, all men), mostly in their sixties. The afternoon was wearing on, with many different people giving their various views about how to organize what amounted to a putative great new interdisciplinary university.

Here’s the original seating chart, together with a current view of the meeting room. (I’m only “Steve” to Americans currently over the age of 60…):

Santa Fe seating chart
Dobkin Boardroom

I think I was less patient in those days. But eventually I could stand it no longer. I don’t remember my exact words, but they boiled down to: “What are you going to do if you only raise a few million dollars, not two billion?” It was a strange moment. After all, I was by far the youngest person there—at 25 years old—and yet it seemed to have fallen to me to play the “let’s get real” role. (To be fair, I had founded my first tech company a couple of years earlier, and wasn’t a complete stranger to the world of grandiose “what-if” discussions, even if I was surprised, though more than a little charmed, to be seeing them in the sixty-something-year-old set.)

A fragment of my notes from the day record my feelings:

What is supposed to be the point of this discussion?

George Cowan (Manhattan Project alum, Los Alamos administrator, and founder of the Los Alamos Bank) was running the meeting, and I sensed a mixture of frustration and relief at my question. I don’t remember precisely what he said, but it boiled down to: “Well, what do you think we should do?” “Well”, I said, “I do have a suggestion”. I summarized it a bit, but then it was agreed that later that day I should give a more formal presentation. And that’s basically how I came to suggest that what would become the Santa Fe Institute should focus on what I called “Complex Systems Theory”.

Of course, there was a whole backstory to this. It basically began in 1972, when I was 12 years old, and saw the cover of a college physics textbook that purported to show an arrangement of simulated colliding molecules progressively becoming more random. I was fascinated by this phenomenon, and quite soon started trying to use a computer to understand it. I didn’t get too far with this. But it was the golden age of particle physics, and I was soon swept up in publishing papers about a variety of topics in particle physics and cosmology.

Still, in all sorts of different ways I kept on coming back to my interest in how randomness—or complexity—gets produced. In 1978 I went to Caltech as a graduate student, with Murray Gell-Mann (inventor of quarks, and the first chairman of the Santa Fe Institute) doing his part to recruit me by successfully tracking down a phone number for me in England. Then in 1979, as a way to help get physics done, I set about building my first large-scale computer language. In 1981, the first version was finished, I was installed as a faculty member at Caltech—and I decided it was time for me to try something more ambitious, and really see what I could figure out about my old interest in randomness and complexity.

By then I had picked away at many examples of complexity. In self-gravitating gases. In dendritic crystal growth. In road traffic flow. In neural networks. But the reductionist physicist in me wanted to drill down and find out what was underneath all these. And meanwhile the computer language designer in me thought, “Let’s just invent something and see what can be done with it”. Well, pretty soon I invented what I later found out were called cellular automata.

I didn’t expect that simple cellular automata would do anything particularly interesting. But I decided to try computer experiments on them anyway. And to my great surprise I discovered that—despite the simplicity of their construction—cellular automata can in fact produce behavior of great complexity. It’s a major shock to traditional scientific intuition—and, as I came to realize in later years, a clue to a whole new kind of science.

But for me the period from 1981 to 1984 was an exciting one, as I began to explore the computational universe of simple programs like cellular automata, and saw just how rich and unexpected it was. David Pines, as the editor of Reviews of Modern Physics, had done me the favor of publishing my first big paper on cellular automata (John Maddox, editor of Nature, had published a short summary a little earlier). Through the Center for Nonlinear Studies, I had started making visits to Los Alamos in 1981, and I initiated and co-organized the first-ever conference devoted to cellular automata, held at Los Alamos in 1983.

In 1983 I had left Caltech (primarily as a result of an unhappy interaction about intellectual property rights) and gone to the Institute for Advanced Study in Princeton, and begun to build a group there concerned with studying the basic science of complex systems. I wasn’t sure until quite a few years later just how general the phenomena I’d seen in cellular automata were. But I was pretty certain that there were at least many examples of complexity across all sorts of fields that they’d finally let one explain in a fundamental, theoretical way.

I’m not sure when I first heard about plans for what was then called the Rio Grande Institute. But I remember not being very hopeful about it; it seemed too correlated with the retirement plans of a group of older physicists. But meanwhile, people like Pete Carruthers (director of T Division at Los Alamos) were encouraging me to think about starting my own institute to pursue the kind of science I thought could be done.

I didn’t know quite what to make of the letter I received in July 1984 from Nick Metropolis (long-time Los Alamos scientist, and inventor of the Metropolis method). It described the nascent Rio Grande Institute as “a teaching and research institution responsive to the challenge of emerging new syntheses in science”. Murray Gell-Mann had told me that it would bring together physics and archaeology, linguistics and cosmology, and more. But at least in the circulated documents, the word “complexity” appeared quite often.

Letter from Los Alamos—click to enlarge

The invitation described the workshop as being “to examine a concept for a fresh approach to research and teaching in rapidly developing fields of scientific activity dealing with highly complex, interactive systems”. Murray Gell-Mann, who had become a sort of de facto intellectual leader of the effort, was given to quite flowery descriptions, and declared that the institute would be involved with “simplicity and complexity”.

When I arrived at the workshop it was clear that everyone wanted their favorite field to get a piece of the potential action. Should I even bring up my favorite emerging field? Or should I just make a few comments about computers and let the older guys do their thing?

As I listened to the talks and discussions, I kept wondering how what I was studying might relate to them. Quite often I really didn’t know. At the time I still believed, for example, that adaptive systems might have fundamentally different characteristics. But still, the term “complexity” kept on coming up. And if the Rio Grande Institute needed an area to concentrate on, it seemed that a general study of complexity would be the closest to being central to everything they were talking about.

I’m not sure quite what the people in the room made of my speech about “complex systems theory”. But I think I did succeed in making the point that there really could be a general “science of complexity”—and that things like cellular automata could show one how it might work. People had been talking about the complexity of this, or the complexity of that. But it seemed like I’d at least started the process of getting people to talk about complexity as an abstract thing one could expect to have general theories about.

After that first workshop, I had a few more interactions with what was to be the Santa Fe Institute. I still wasn’t sure what was going to happen with it—but the “science of complexity” idea did seem to be sticking. Meanwhile, however, I was forging ahead with my own plans to start a complex systems institute (I avoided the term “complexity theory” out of deference to the rather different field of computational complexity theory). I was talking to all sorts of universities, and in fact David Pines was encouraging me to consider the University of Illinois.

George Cowan asked me if I’d be interested in running the research program for the Santa Fe Institute, but by that point I was committed to starting my own operation, and it wasn’t long afterwards that I decided to do it at the University of Illinois. My Center for Complex Systems Research—and my journal Complex Systems—began operations in the summer of 1986.

Complex Systems

I’m not sure how things would have been different if I’d ended up working with the Santa Fe Institute. But as it was, I rather quickly tired of the effort to raise money for complex systems research, and I was soon off creating what became Mathematica (and now the Wolfram Language), and starting my company Wolfram Research.

By the early 1990s, probably in no small part through the efforts of the Santa Fe Institute, “complexity” had actually become a popular buzzword, and, partly through a rather circuitous connection to climate science, funding had started pouring in. But having launched Mathematica and my company, I’d personally pretty much vanished from the scene, working quietly on using the tools I’d created to pursue my interests in basic science. I thought it would only take a couple of years, but in the end it took more than a decade.

I discovered a lot—and realized that, yes, the phenomena I’d first seen with cellular automata and talked about at the Santa Fe workshop were indeed a clue to a whole new kind of science, with all sorts of implications for long-standing problems and for the future. I packaged up what I’d figured out—and in 2002 published my magnum opus A New Kind of Science.

A New Kind of Science

It was strange to reemerge after a decade and a half away. The Santa Fe Institute had continued to pursue the science of complexity. As something of a hermit in those years, I hadn’t interacted with it—but there was curiosity about what I was doing (highlighted, if nothing else, by a bizarre incident in 1998 involving “leaks” about my research). When my book came out in 2002 I was pleased that I thought I’d actually done what I talked about doing back at that Santa Fe workshop in 1984—as well as much more.

But by then almost nobody who’d been there in 1984 was still involved with the Santa Fe Institute, and instead there was a “new guard” (now, I believe, again departed), who, far from being pleased with my progress and success in broadening the field, actually responded with rather unseemly hostility.

It’s been an interesting journey from those days in October 1984. Today complex systems research is very definitely “a thing”, and there are hundreds of “complex systems” institutes around the world. (Though I still don’t think the basic science of complexity, as opposed to its applications, has received the attention it should.) But the Santa Fe Institute remains the prototypical example—and it’s not uncommon when I talk about complexity research for people to ask, “Is that like what the Santa Fe Institute does?”

“Well actually”, I sometimes say, “there’s a little footnote to history about that”. And off I go, talking about that Saturday afternoon back in October 1984—when I could be reached (as the notes I distributed said) through that newfangled thing called email at ias!swolf

Stephen Wolfram's notes on complex systems—click to enlarge

]]> 0
<![CDATA[A Few Thoughts about Deep Fakes]]> Wed, 12 Jun 2019 23:55:38 +0000 Stephen Wolfram deep-fake-thumbSomeone from the House Permanent Select Committee on Intelligence recently contacted me about a hearing they’re having on the subject of deep fakes. I can’t attend the hearing, but the conversation got me thinking about the subject of deep fakes, and I made a few quick notes…. What You See May Not Be What Happened [...]]]> deep-fake-thumb

Someone from the House Permanent Select Committee on Intelligence recently contacted me about a hearing they’re having on the subject of deep fakes. I can’t attend the hearing, but the conversation got me thinking about the subject of deep fakes, and I made a few quick notes….

What You See May Not Be What Happened

The idea of modifying images is as old as photography. At first, it had to be done by hand (sometimes with airbrushing). By the 1990s, it was routinely being done with image manipulation software such as Photoshop. But it’s something of an art to get a convincing result, say for a person inserted into a scene. And if, for example, the lighting or shadows don’t agree, it’s easy to tell that what one has isn’t real.

What about videos? If one does motion capture, and spends enough effort, it’s perfectly possible to get quite convincing results—say for animating aliens, or for putting dead actors into movies. The way this works, at least in a first approximation, is for example to painstakingly pick out the keypoints on one face, and map them onto another.

What’s new in the past couple of years is that this process can basically be automated using machine learning. And, for example, there are now neural nets that are simply trained to do “face swapping”:

Face swap

In essence, what these neural nets do is to fit an internal model to one face, and then apply it to the other. The parameters of the model are in effect learned from looking at lots of real-world scenes, and seeing what’s needed to reproduce them. The current approaches typically use generative adversarial networks (GANs), in which there’s iteration between two networks: one trying to generate a result, and one trying to discriminate that result from a real one.

Today’s examples are far from perfect, and it’s not too hard for a human to tell that something isn’t right. But even just as a result of engineering tweaks and faster computers, there’s been progressive improvement, and there’s no reason to think that within a modest amount of time it won’t be possible to routinely produce human-indistinguishable results.

Can Machine Learning Police Itself?

OK, so maybe a human won’t immediately be able to tell what’s real and what’s not. But why not have a machine do it? Surely there’s some signature of something being “machine generated”. Surely there’s something about a machine-generated image that’s statistically implausible for a real image.

Well, not naturally. Because, in fact, the whole way the machine images are generated is by having models that as faithfully as possible reproduce the “statistics” of real images. Indeed, inside a GAN there’s explicitly a “fake or not” discriminator. And the whole point of the GAN is to iterate until the discriminator can’t tell the difference between what’s being generated, and something real.

Could one find some other feature of an image that the GAN isn’t paying attention to—like whether a face is symmetric enough, or whether writing in the background is readable? Sure. But at this level it’s just an arms race: having identified a feature, one puts it into the model the neural net is using, and then one can’t use that feature to discriminate any more.

There are limitations to this, however. Because there’s a limit to what a typical neural net can learn. Generally, neural nets do well at tasks like image recognition that humans do without thinking. But it’s a different story if one tries to get neural nets to do math, and for example factor numbers.

Imagine that in modifying a video one has to fill in a background that’s showing some elaborate computation—say a mathematical one. Well, then a standard neural net basically doesn’t stand a chance.

Will it be easy to tell that it’s getting it wrong? It could be. If one’s dealing with public-key cryptography, or digital signatures, one can certainly imagine setting things up so that it’s very hard to generate something that is correct, but easy to check whether it is.

But will this kind of thing show up in real images or videos? My own scientific work has actually shown that irreducibly complex computation can be quite ubiquitous even in systems with very simple rules—and presumably in many systems in nature. Watch a splash in water. It takes a complex computation to figure out the details of what’s going to happen. And while a neural net might be able to get something that basically looks like a splash, it’d be vastly harder for it to get the details of a particular splash right.

But even though in the abstract computational irreducibility may be common, we humans, in our evolution and the environments we set up for ourselves, tend to end up doing our best to avoid it. We have shapes with smooth curves. We build things with simple geometries. We try to make things evolvable or understandable.  And it’s this avoidance of computational irreducibility that makes it feasible for neural nets to successfully model things like the visual scenes in which we typically find ourselves.

One can disrupt this, of course. Just put in the picture a display that’s showing some sophisticated computation (even, for example, a cellular automaton). If someone tries to fake some aspect of this with a neural net, it won’t (at least on its own) feasibly be able to get the details right.

I suspect that in the future of human technology—as we mine deeper in the computational universe—irreducible computation will be much more common in what we build. But as of now, it’s still rare in typical human-related situations. And as a result, we can expect that neural nets will successfully be able to model what’s going on well enough to at least fool other neural nets.

How to Know What’s Real

So if there’s no way to analyze the bits in an image to tell if it’s a real photograph, does that mean we just can’t tell? No. Because we can also think about metadata associated with the image—and about the provenance of the image. When was the image created? By whom? And so on.

So let’s say we create an image. How can we set things up so that we can prove when we did it? Well, in modern times it’s actually very easy. We take the image, and compute a cryptographic hash from it (effectively by applying a mathematical operation that derives a number from the bits in the image). Then we take this hash and put it on a blockchain.

The blockchain acts as a permanent ledger. Once we’ve put data on it, it can never be changed, and we can always go back and see what the data was, and when it was added to the blockchain.

This setup lets us prove that the image was created no later than a certain time. If we want to prove that the image wasn’t created earlier, then when we create the hash for the image, we can throw in a hash from the latest block on our favorite blockchain.

OK, but what about knowing who created the image? It takes a bit of cryptographic infrastructure—very similar to what’s done in proving the authenticity of websites. But if one can trust some “certificate authority” then one can associate a digital signature to the image that validates who created it.

But how about knowing where the image was taken? Assuming one has a certain level of access to the device or the software, GPS can be spoofed. If one records enough about the environment when the image was taken, then it gets harder and harder to spoof. What were the nearby Wi-Fi networks? The Bluetooth pings? The temperature? The barometric pressure? The sound level? The accelerometer readings? If one has enough information collected, then it becomes easier to tell if something doesn’t fit.

There are several ways one could do this. Perhaps one could just detect anomalies using machine learning. Or perhaps one could use actual models of how the world works (the path implied by the accelerometer isn’t consistent with the equations of mechanics, etc.). Or one could somehow tie the information to some public computational fact. Was the weather really like that in the place the photo was said to be taken? Why isn’t there a shadow from such-and-such a plane going overhead? Why is what’s playing on the television not what it should be? Etc.

But, OK, even if one just restricts oneself to creation time and creator ID, how can one in practice validate them?

The best scheme seems to be something like how modern browsers handle website security. The browser tries to check the cryptographic signature of the website. If it matches, the browser shows something to say the website is secure; if not, it shows some kind of warning.

So let’s say an image comes with data on its creation time and creator ID. The data could be metadata (say EXIF data), or it could be a watermark imprinted on the detailed bits in the image. Then the image viewer (say in the browser) can check whether the hash on a blockchain agrees with what the data provided by the image implies. If it does, fine. And the image viewer can make the creation time and creator ID available. If not, the image viewer should warn the user that something seems to be wrong.

Exactly the same kind of thing can be done with videos. It just requires video players computing hashes on the video, and comparing to what’s on a blockchain. And by doing this, one can guarantee, for example, that one’s seeing a whole video that was made at a certain time.

How would this work in practice? Probably people often wouldn’t want to see all the raw video taken at some event. But a news organization, for example, could let people click through to it if they wanted. And one can easily imagine digital signature mechanisms that could be used to guarantee that an edited video, for example, contained no content not in certain source videos, and involved, say, specified contiguous chunks from these source videos.

The Path Forward

So, where does this leave us with deep fakes? Machine learning on its own won’t save us. There’s not going to be a pure “fake or not” detector that can run on any image or video. Yes, there’ll be ways to protect oneself against being “faked” by doing things like wearing a live cellular automaton tie. But the real way to combat deep fakes, I think, is to use blockchain technology—and to store on a public ledger cryptographic hashes of both images and sensor data from the environment where the images were acquired. The very presence of a hash can guarantee when an image was acquired; “triangulating” from sensor and other data can give confidence that what one is seeing was something that actually happened in the real world.

Of course, there are lots of technical details to work out. But in time I’d expect image and video viewers could routinely check against blockchains (and “data triangulation computations”), a bit like how web browsers now check security certificates. And today’s “pics or it didn’t happen” will turn into “if it’s not on the blockchain it didn’t happen”.

]]> 3
<![CDATA[The Wolfram Function Repository: Launching an Open Platform for Extending the Wolfram Language]]> Tue, 11 Jun 2019 14:00:08 +0000 Stephen Wolfram sw-blog-thumb-funct-repoWhat the Wolfram Language Makes Possible We’re on an exciting path these days with the Wolfram Language. Just three weeks ago we launched the Free Wolfram Engine for Developers to help people integrate the Wolfram Language into large-scale software projects. Now, today, we’re launching the Wolfram Function Repository to provide an organized platform for functions [...]]]> sw-blog-thumb-funct-repo

What the Wolfram Language Makes Possible

We’re on an exciting path these days with the Wolfram Language. Just three weeks ago we launched the Free Wolfram Engine for Developers to help people integrate the Wolfram Language into large-scale software projects. Now, today, we’re launching the Wolfram Function Repository to provide an organized platform for functions that are built to extend the Wolfram Language—and we’re opening up the Function Repository for anyone to contribute.

The Wolfram Function Repository is something that’s made possible by the unique nature of the Wolfram Language as not just a programming language, but a full-scale computational language. In a traditional programming language, adding significant new functionality typically involves building whole libraries, which may or may not work together. But in the Wolfram Language, there’s so much already built into the language that it’s possible to add significant functionality just by introducing individual new functions—which can immediately integrate into the coherent design of the whole language.

To get it started, we’ve already got 532 functions in the Wolfram Function Repository, in 26 categories:

The Wolfram Function Repository

Just like the 6000+ functions that are built into the Wolfram Language, each function in the Function Repository has a documentation page, with a description and examples:


Go to the page, click to copy the “function blob”, paste it into your input, and then use the function just like a built-in Wolfram Language function (all necessary downloading etc. is already handled automatically in Version 12.0):


ResourceFunction["LogoQRCode"]["", CloudGet[""]]

And what’s critical here is that in introducing LogoQRCode you don’t, for example, have to set up a “library to handle images”: there’s already a consistent and carefully designed way to represent and work with images in the Wolfram Language—that immediately fits in with everything else in the language:


   ColorNegate[CloudGet[""]]], #^k &], {k, 1,
   2, .25}]

I’m hoping that—with the help of the amazing and talented community that’s grown up around the Wolfram Language over the past few decades—the Wolfram Function Repository is going to allow rapid and dramatic expansion in the range of (potentially very specialized) functions available for the language. Everything will leverage both the content of the language, and the design principles that the language embodies. (And, of course, the Wolfram Language has a 30+ year history of design stability.)

Inside the functions in the Function Repository there may be tiny pieces of Wolfram Language code, or huge amounts. There may be calls to external APIs and services, or to external libraries in other languages. But the point is that when it comes to user-level functionality everything will fit together, because it’s all based on the consistent design of the Wolfram Language—and every function will automatically “just work”.

We’ve set it up to be as easy as possible to contribute to the Wolfram Function Repository—essentially just by filling out a simple notebook. There’s automation that helps ensure that everything meets our design guidelines. And we’re focusing on coverage, not depth—and (though we’re putting in place an expert review process) we’re not insisting on anything like the same kind of painstaking design analysis or the same rigorous standards of completeness and robustness that we apply to built-in functions in the language.

There are lots of tradeoffs and details. But our goal is to optimize the Wolfram Function Repository both for utility to users, and for ease of contribution. As it grows, I’ve no doubt that we’ll have to invent new mechanisms, not least for organizing a large number of functions, and finding the ones one wants. But it’s very encouraging to see that it’s off to such a good start. I myself contributed a number of functions to the initial collection. Many are based on code that I’ve had for a long time. It only took me minutes to submit them to the Repository. But now that they’re in the Repository, I can—for the first time ever—immediately use the functions whenever I want, without worrying about finding files, loading packages, etc.

Low Cost, High Payoff

We’ve had ways for people to share Wolfram Language code since even before the web (our first major centralized effort was MathSource, built for Mathematica in 1991, using CD-ROMs, etc.). But there’s something qualitatively different—and much more powerful—about the Wolfram Function Repository.

We’ve worked very hard for more than 30 years to maintain the design integrity of the Wolfram Language, and this has been crucial in allowing the Wolfram Language to become not just a programming language, but a full-scale computational language. And now what the Wolfram Function Repository does is to leverage all this design effort to let new functions be added that fit consistently into the framework of the language.

Inside the implementation of each function, all sorts of things can be going on. But what’s critical is that to the user, the function is presented in a very definite and uniform way. In a sense, the built-in functions of the Wolfram Language provide 6000+ consistent examples of how functions should be designed (and our livestreamed design reviews include hundreds of hours of the process of doing that design). But more than that, what ultimately makes the Wolfram Function Repository able to work well is the symbolic character of the Wolfram Language, and all the very rich structures that are already built into the language. If you’ve got a function that deals with images—or sparse arrays, or molecular structures, or geo positions, or whatever—there’s already a consistent symbolic representation of those in the language, and by using that, your function is immediately compatible with other functions in the system.

Setting up a repository that really works well is an interesting meta-design problem. Give too little freedom and one can’t get the functionality one wants. Give too much freedom and one won’t be able to maintain enough consistency. We’ve had several previous examples that have worked very well. The Wolfram Demonstrations Project—launched in 2007 and now (finally) running interactively on the web—contains more than 12,000 contributed interactive demonstrations. The Wolfram Data Repository has 600+ datasets that can immediately be used in the Wolfram Language. And the Wolfram Neural Net Repository adds neural nets by the week (118 so far) that immediately plug into the NetModel function in the Wolfram Language.

All these examples have the feature that the kind of thing that’s being collected is well collimated. Yes, the details of what actual Demonstration or neural net or whatever one has can vary a lot, but the fundamental structure for any given repository is always the same. So what about a repository that adds extensions to the Wolfram Language? The Wolfram Language is set up to be extremely flexible—so it can basically be extended and changed in any way. And this is tremendously important in making it possible to quickly build all sorts of large-scale systems in the Wolfram Language. But with this flexibility comes a cost. Because the more one makes use of it, the more one ends up with a separated tower of functionality—and the less one can expect that (without tremendous design effort) what one builds will consistently fit in with everything else.

In traditional programming languages, there’s already a very common problem with libraries. If you use one library, it might be OK. But if you try to use several, there’s no guarantee that they fit together. Of course, it doesn’t help that in a traditional programming language—as opposed to a full computational language—there’s no expectation of even having consistent built-in representations for anything but basic data structures. But the problem is bigger than that: whenever one builds a large-scale tower of functionality, then without the kind of immense centralized design effort that we’ve put into the Wolfram Language, one won’t be able to achieve the consistency and coherence needed for everything to always work well together.

So the idea of the Wolfram Function Repository is to avoid this problem by just adding bite-sized extensions in the form of individual functions—that are much easier to design in a consistent way. Yes, there are things that cannot conveniently be done with individual functions (and we’re soon going to be releasing a streamlined mechanism for distributing larger-scale packages). But with everything that’s already built into the Wolfram Language there’s an amazing amount that individual functions can do. And the idea is that with modest effort it’s possible to create very useful functions that maintain enough design consistency that they fit together and can be easily and widely used.

It’s a tradeoff, of course. With a larger-scale package one can introduce a whole new world of functionality, which can be extremely powerful and valuable. But if one wants to have new functionality that will fit in with everything else, then—unless one’s prepared to spend immense design effort—it’ll have to be smaller scale. The idea of the Wolfram Function Repository is to hit a particular sweet spot that allows for powerful functionality to be added while making it manageably easy to maintain good design consistency.

Contributing to the Repository

We’ve worked hard to make it easy to contribute to the Wolfram Function Repository. On the desktop (already in Version 12.0), you can just go to File > New > Repository Item > Function Repository Item and you’ll get a “Definition Notebook” (programmatically, you can also use CreateNotebook["FunctionResource"]):

Definition notebook

There are two basic things you have to do: first, actually give the code for your function and, second, give documentation that shows how the function should be used.

Press the Open Sample button at the top to see an example of what you need to do:


Essentially, you’re trying to make something that’s like a built-in function in the Wolfram Language. Except that it can be doing something much more specific than a built-in function ever would. And the expectations for how complete and robust it is are much lower.

But you’ll need a name for your function, that fits in with Wolfram Language function naming principles. And you’ll need documentation that follows the same pattern as for built-in functions. I’ll say more later about these things. But for now, just notice that in the row of buttons at the top of the Definition Notebook there’s a Style Guidelines button that explains more about what to do, and there’s a Tools button that provides tools—especially for formatting documentation.

When you think you’re ready, press the Check button. It’s OK if you haven’t gotten all the details right yet. Because Check will automatically go through and do lots of style and consistency checks. Often it will make immediate suggestions for you to approve (“This line should end with a colon” and it’ll offer to put the colon in). Sometimes it will ask you to add or change something yourself. We’ll be continually adding to the automatic functionality of Check, but basically its goal to try to ensure that anything you submit to the Function Repository is already guaranteed to follow as many of the style guidelines as possible.

Check comments

OK, so after you run Check, you can use Preview. Preview generates a preview of the documentation page that you’ve defined for your function. You can choose to create a preview either in a desktop notebook, or in the cloud. If you don’t like something you see in the preview, just go back and fix it, and press Preview again.

Now you’re ready to deploy your function. The Deploy button provides four options:


The big thing you can do is to submit your function to the Wolfram Function Repository, so it’s available to everyone forever. But you can also deploy your function for more circumscribed use. For example, you can have the function just deployed locally on your computer, so it will be available whenever you use that particular computer. Or you can deploy it to your cloud account, so it will be available to you whenever you’re connected to the cloud. You can also deploy a function publicly through your cloud account. It won’t be in the central Wolfram Function Repository, but you’ll be able to give anyone a URL that’ll let them get your function from your account. (In the future, we’ll also be supporting organization-wide central repositories.)

OK, let’s say you’re ready to actually submit your function to the Wolfram Function Repository. Then, needless to say, you press Submit to Repository. So what happens then? Well, your submission immediately goes into a queue for review and approval by our team of curators.

As your submission goes through the process (which will typically take a few days) you’ll get status messages—as well as maybe suggestions. But as soon as your function is approved, it’ll immediately be published in the Wolfram Function Repository, and available for anyone to use. (And it’ll show up in New Functions digests, etc. etc.)

What Should Be in the Repository

We have very high standards for the completeness, robustness—and overall quality—of the 6000+ functions that we’ve painstakingly built into the Wolfram Language over the past 30+ years. The goal of the Wolfram Function Repository is to leverage all the structure and functionality that already exists in the Wolfram Language to add as many as possible, much more lightweight, functions.

Yes, functions in the Wolfram Function Repository need to follow the design principles of the Wolfram Language—so they fit in with other functions, and with users’ expectations about how functions should work. But they don’t need to have the same completeness or robustness.

In the built-in functions of the Wolfram Language, we work hard to make things be as general as possible. But in the Wolfram Function Repository, there’s nothing wrong with having a function that just handles some very specific, but useful, case. SendMailFromNotebook can accept notebooks in one specific format, and produce mail in one specific way. PolygonalDiagram makes diagrams only with particular colors and labeling. And so on.

Another thing about built-in functions is that we go to great pains to handle all the corner cases, to deal with bad input properly, and so on. In the Function Repository it’s OK to have a function that just handles the main cases—and ignores everything else.

Obviously it’s better to have functions that do more, and do it better. But the optimization for the Function Repository—as opposed to for the built-in functions of the Wolfram Language—is to have more functions, covering more functionality, rather than to deepen each function.

What about testing the functions in the Function Repository? The expectations are considerably lower than for built-in functions. But—particularly when functions depend on external resources such as APIs—it’s important to be continually running regression tests, which is what automatically happens behind the scenes. In the Definition Notebook, you can explicitly give (in the Additional Information section) as many tests as you want, defined either by input and output lines or by full symbolic VerificationTest objects. In addition, the system tries to turn the documentation examples you give into tests (though this can sometimes be quite tricky, e.g. for a function whose result depends on random numbers, or the time of day).

There’ll be a whole range of implementation complexity to the functions in the Function Repository. Some will be just a single line of code; others might involve thousands or tens of thousands of lines, probably spread over many subsidiary functions. When is it worth adding a function that takes only very little code to define? Basically, if there’s a good name for the function—that people would readily understand if they saw it in a piece of code—then it’s worth adding. Otherwise, it’s probably better just to write the code again each time you need to use it.

The primary purpose of the Function Repository (as its name suggests) is to introduce new functions. If you want to introduce new data, or new entities, then use the Wolfram Data Repository. But what if you want to introduce new kinds of objects to compute with?

There are really two cases. You might want a new kind of object that’s going to be used in new functions in the Function Repository. And in that case, you can always just write down a symbolic representation of it, and use it in the input or output of functions in the Function Repository.

But what if you want to introduce an object and then define how existing functions in the Wolfram Language should operate on it? Well, the Wolfram Language has always had an easy mechanism for that, called upvalues. And with certain restrictions (particularly for functions that don’t evaluate their arguments), the Function Repository lets you just introduce a function, and define upvalues for it. (To set expectations: getting a major new construct fully integrated everywhere in the Wolfram Language is typically a very significant undertaking, that can’t be achieved just with upvalues—and is the kind of thing we do as part of the long-term development of the language, but isn’t what the Function Repository is set up to deal with.)

But, OK, so what can be in the code for functions in the Function Repository? Anything built into the Wolfram Language, of course (at least so long as it doesn’t pose a security risk). Also, any function from the Function Repository. But there are other possibilities, too. A function in the Function Repository can call an API, either in the Wolfram Cloud or elsewhere. Of course, there’s a risk associated with this. Because there’s no guarantee that the API won’t change—and make the function in the Function Repository stop working. And to recognize issues like this, there’s always a note on the documentation page (under Requirements) for any function that relies on more than just built-in Wolfram Language functionality. (Of course, when real-world data is involved, there can be issues even with this functionality—because actual data in the world changes, and even sometimes changes its definitions.)

Does all the code for the Wolfram Function Repository have to be written in the Wolfram Language? The code inside an external API certainly doesn’t have to be. And, actually, nor even does local code. In fact, if you find a function in pretty much any external language or library, you should be able to make a wrapper that allows it to be used in the Wolfram Function Repository. (Typically this will involve using ExternalEvaluate or ExternalFunction in the Wolfram Language code.)

So what’s the point of doing this? Basically, it’s to leverage the whole integrated Wolfram Language system and its unified design. You get the underlying implementation from an external library or language—but then you’re using the Wolfram Language’s rich symbolic structure to create a convenient top-level function that makes it easy for people to use whatever functionality has been implemented. And, at least in a perfect world, all the details of loading libraries and so on will be automatically taken care of through the Wolfram Language. (In practice, there can sometimes be issues getting external languages set up on a particular computer system—and in the cloud there are additional security issues to worry about.)

By the way, when you first look at typical external libraries, they often seem far too complicated to just be covered by a few functions. But in a great many cases, most of the complexity comes from building up the infrastructure needed for the library—and all the functions to support that. When one’s using the Wolfram Language, however, the infrastructure is usually already built in, and so one doesn’t need to expose all those support functions—and one only needs to create functions for the few “topmost” applications-oriented functions in the library.

The Ecosystem of the Repository

If you’ve written functions that you use all the time, then send them in to the Wolfram Function Repository! If nothing else, it’ll be much easier for you to use the functions yourself. And, of course, if you use the functions all the time, it’s likely other people will find them useful too.

Of course, you may be in a situation where you can’t—or don’t want to—share your functions, or where they access private resources. And in such cases, you can just deploy the functions to your own cloud account, setting permissions for who can access them. (If your organization has a Wolfram Enterprise Private Cloud, then this will soon be able to host its own private Function Repository, which can be administered within your organization, and set to force review of submissions, or not.)

Functions you submit to the Wolfram Function Repository don’t have to be perfect; they just have to be useful. And—a bit like the “Bugs” section in classic Unix documentation—there’s a section in the Definition Notebook called “Author Notes” in which you can describe limitations, issues, etc. that you’re already aware of about your function. In addition, when you submit your function you can include Submission Notes that’ll be read by the curation team.

Once a function is published, its documentation page always has two links at the bottom: “Send a message about this function”, and “Discuss on Wolfram Community”. If you send a message (say reporting a bug), you can check a box saying you want your message and contact information to be passed to the author of the function.

Often you’ll just want to use functions from the Wolfram Function Repository like built-in functions, without looking inside them. But if you want to “look inside”, there’s always a Source Notebook button at the top. Press it and you’ll get your own copy of the original Definition Notebook that was submitted to the Function Repository. Sometimes you might just want to look at this as an example. But you can also make your own modifications. Maybe you’ll want to deploy these on your computer or in your cloud account. Or maybe you’ll want to submit these to the Function Repository, perhaps as a better version of the original function.

In the future, we might support Git-style forking in the Function Repository. But for now, we’re keeping it simpler, and we’re always having just one canonical version of each function. And basically (unless they abandon it and don’t respond to messages) the original author of the function gets to control updates to it—and gets to submit new versions, which are then reviewed and, if approved, published.

OK, so how does versioning work? Right now, as soon as you use a function from the Function Repository its definition will get permanently stored on your computer (or in your cloud account, if you’re using the cloud). If there’s a new version of the function, then when you next use the function, you’ll get a message letting you know this. And if you want to update to the new version, you can do that with ResourceUpdate. (The “function blob” actually stores more information about versioning, and in the future we’re planning on making this conveniently accessible.)

One of the great things about the Wolfram Function Repository is that any Wolfram Language program anywhere can use functions from it. If the program appears in a notebook, it’s often nice to format Function Repository functions as easy-to-read “function blobs” (perhaps with appropriate versioning set).

But you can always refer to any Function Repository function using a textual ResourceFunction[...]. And this is convenient if you’re directly writing code or scripts for the Wolfram Engine, say with an IDE or textual code editor. (And, yes, the Function Repository is fully compatible with the Free Wolfram Engine for Developers.)

How It Works

Inside the Wolfram Function Repository it’s using exactly the same Resource System framework as all our other repositories (Data Repository, Neural Net Repository, Demonstrations Project, etc.) And like everything else in the Resource System, a ResourceFunction is ultimately based on a ResourceObject.

Here’s a ResourceFunction:



It’s somewhat complicated inside, but you can see some of what’s there using Information:



So how does setting up a resource function work? The simplest is the purely local case. Here’s an example that takes a function (here, just a pure function) and defines it as a resource function for this session:


DefineResourceFunction[1 + # &, "AddOne"]

Once you’ve made the definition, you can use the resource function:



Notice that in this function blob, there’s a black icon . This indicates the function blob refers to an in-memory resource function defined for your current session. For a resource function that’s permanently stored on your computer, or in a cloud account, there’s a gray icon . And for an official resource function in the Wolfram Function Repository there’s an orange icon .

OK, so what happens when you use the Deploy menu in a Definition Notebook? First, it’ll take everything in the Definition Notebook and make a symbolic ResourceObject out of it. (And if you’re using a textual IDE—or a program—you can also explicitly create the ResourceObject.)

Deploying locally on your computer uses LocalCache on the resource object to store it as a LocalObject in your file system. Deploying in your cloud account uses CloudDeploy on the resource object, and deploying publicly in the cloud uses CloudPublish. In all cases, ResourceRegister is also used to register the name of the resource function so that ResourceFunction["name"] will work.

If you press Submit to Function Repository, then what’s happening underneath is that ResourceSubmit is being called on the resource object. (And if you’re using a textual interface, you can call ResourceSubmit directly.)

By default, the submission is made under the name associated with your Wolfram ID. But if you’re submitting on behalf of a group or an organization, then you can set up a separate Publisher ID, and you can instead use this as the name to associate with your submissions.

Once you’ve submitted something to the Function Repository, it’ll go into the queue for review. If you get comments back, they’ll usually be in the form of a notebook with extra “comment cells” added. You can always check on the status of your submission by going to the Resource System Contributor Portal. But as soon as it’s approved, you’ll be notified (by email), and your submission will be live on the Wolfram Function Repository.

Some Subtleties

At first, it might seem like it should be possible to take a Definition Notebook and just put it verbatim into the Function Repository. But actually there are quite a few subtleties—and handling them requires doing some fairly sophisticated metaprogramming, symbolically processing both the code defining the function, and the Definition Notebook itself. Most of this happens internally, behind the scenes. But it has some consequences that are worth understanding if you’re going to contribute to the Function Repository.

Here’s one immediate subtlety. When you fill out the Definition Notebook, you can just refer to your function everywhere by a name like MyFunction—that looks like an ordinary name for a function in the Wolfram Language. But for the Function Repository documentation, this gets replaced by ResourceFunction["MyFunction"]—which is what users will actually use.

Here’s another subtlety: when you create a resource function from a Definition Notebook, all the dependencies involved in the definition of the function need to be captured and explicitly included. And to guarantee that the definitions remain modular, one needs to put everything in a unique namespace. (Needless to say, the functions that do all this are in the Function Repository.)

Usually you’ll never see any evidence of the internal context used to set up this namespace. But if for some reason you return an unevaluated symbol from the innards of your function, then you’ll see that the symbol is in the internal context. However, when the Definition Notebook is processed, at least the symbol corresponding to the function itself is set up to be displayed elegantly as a function blob rather than as a raw symbol in an internal context.

The Function Repository is about defining new functions. And these functions may have options. Often these options will be ones (like, say, Method or ImageSize) that have already been used for built-in functions, and for which built-in symbols already exist. But sometimes a new function may need new options. To maintain modularity, one might like these options to be symbols defined in a unique internal context (or to be something like whole resource functions in their own right). But to keep things simple, the Function Repository allows new options to be given in definitions as strings. And, as a courtesy to the final user, these definitions (assuming they’ve used OptionValue and OptionsPattern) are also processed so that when the functions are used, the options can be given not only strings but also as global symbols with the same name.

Most functions just do what they do each time they are called. But some functions need initialization before they can run in a particular session—and to deal with this there’s an Initialization section in the Definition Notebook.

Functions in the Function Repository can immediately make use of other functions that are already in the Repository. But how do you set up definitions for the Function Repository that involve two (or more) functions that refer to each other? Basically you just have to deploy them in your session, so you can refer to them as ResourceFunction["name"]. Then you can create the examples you want, and then submit the functions.

What Happens When the Repository Gets Big?

Today we’re just launching the Wolfram Function Repository. But over time we expect it to grow dramatically, and as it grows there are a variety of issues that we know will come up.

The first is about function names and their uniqueness. The Function Repository is designed so that—like for built-in functions in the Wolfram Language—one can refer to any given function just by giving its name. But this inevitably means that the names of functions have to be globally unique across the Repository—so that, for example, there can be only one ResourceFunction["MyFavoriteFunction"] in the Repository.

This might seem like a big issue. But it’s worth realizing it’s basically the same issue as for things like internet domains or social network handles. And the point is that one simply has to have a registrar—and that’s one of the roles we’re playing for the Wolfram Function Repository. (For private versions of the Repository, their administrators can be registrars.) Of course an internet domain can be registered without having anything on it, but in the Function Repository the name of a function can only be registered if there’s an actual function definition to go with it.

And part of our role in managing the Wolfram Function Repository is to ensure that the name picked for a function is reasonable given the definition of the function—and that it fits in with Wolfram Language naming conventions. We’ve now had 30+ years of experience in naming built-in functions in the Wolfram Language, and our curation team brings that experience to the Function Repository. Of course, there are always tradeoffs. For example, it might seem nice to have a short name for some function. But it’s better to “name defensively” with a longer, more specific name, because then it’s less likely to collide with something one wants to do in the future.

(By the way, just adding some kind of contributor tag to disambiguate functions wouldn’t achieve much. Because unless one insists on always giving the tag, one will end up having to define a default tag for any given function. Oh, and allocating contributor tags again requires global coordination.)

As the Wolfram Function Repository grows, one of the issues that’s sure to arise is the discoverability of functions. Yes, there’s search functionality (and Definition Notebooks can include keywords, etc.). But for built-in functions in the Wolfram Language there’s all sorts of cross-linking in documentation which helps “advertise” functions. Functions in the Function Repository can link to built-in functions. But what about the other way around? We’re going to be experimenting with various schemes to expose Function Repository functions on the documentation pages for built-in functions.

For built-in functions in the Wolfram Language, there’s also a level of discoverability provided by the network of “guide pages” that give organized lists of functions relevant to particular areas. It’s always complicated to appropriately balance guide pages—and as the Wolfram Language has grown, it’s common for guide pages to have to be completely refactored. It’s fairly easy to put functions from the Function Repository into broad categories, and even to successively break up these categories. But it’s much more valuable to have properly organized guide pages. It’s not yet clear how best to produce these for the whole Function Repository. But for example CreateResourceObjectGallery in the Function Repository lets anyone put up a webpage containing their “picks” from the repository:

Resource gallery

The Wolfram Function Repository is set up to be a permanent repository of functions, where any function in it will always just work. But of course, there may be new versions of functions. And we fully expect some functions to be obsoleted over time. The functions will still work if they’re used in programs. But their documentation pages will point to new, better functions.

The Wolfram Function Repository is all about providing new functions quickly—and exploring new frontiers for how the Wolfram Language can be used. But we fully expect that some of what’s explored in the Function Repository will eventually make sense to become built-in parts of the core Wolfram Language. We’ve had a slightly similar flow over the past decade from functionality that was originally introduced in Wolfram|Alpha. And one of the lessons is that to achieve the standards of quality and coherence that we insist on for anything built into the Wolfram Language is a lot of work—that usually dwarfs the original implementation effort. But even so, a function in the Function Repository can serve as a very useful proof of concept for a future function built into the Wolfram Language.

And of course the critical thing is that a function in the Function Repository is something that’s available for everyone to use right now. Yes, an eventual built-in function could be much better and stronger. But the Function Repository lets people get access to new functions immediately. And, crucially, it lets those new functions be contributed by anyone.

Earlier in the history of the Wolfram Language this wouldn’t have worked so well. But now there is so much already built into the language—and so strong an understanding of the design principles of the language—that it’s feasible to have a large community of people add functions that will maintain the design consistency to make them broadly useful.

There’s incredible talent in the community of Wolfram Language users. (And, of course, that community includes many of the world’s top people in R&D across a vast range of fields.) I’m hoping that the Wolfram Function Repository will provide an efficient platform for that talent to be exposed and shared. And that together we’ll be able to create something that dramatically expands the domain to which the computational paradigm can be applied.

We’ve taken the Wolfram Language a long way in 30+ years. Now, together, let’s take it much further. And let’s use the Function Repository—as well as things like the Free Wolfram Engine for Developers—as a platform for doing that.

]]> 0
<![CDATA[Remembering Murray Gell-Mann <br />(1929–2019), Inventor of Quarks</br>]]> Thu, 30 May 2019 15:58:15 +0000 Stephen Wolfram QuarkFirst Encounters In the mid-1970s, particle physics was hot. Quarks were in. Group theory was in. Field theory was in. And so much progress was being made that it seemed like the fundamental theory of physics might be close at hand. Right in the middle of all this was Murray Gell-Mann—responsible for not one, but [...]]]> Quark

First Encounters

In the mid-1970s, particle physics was hot. Quarks were in. Group theory was in. Field theory was in. And so much progress was being made that it seemed like the fundamental theory of physics might be close at hand.

Right in the middle of all this was Murray Gell-Mann—responsible for not one, but most of the leaps of intuition that had brought particle physics to where it was. There’d been other theories, but Murray’s—with their somewhat elaborate and abstract mathematics—were always the ones that seemed to carry the day.

It was the spring of 1978 and I was 18 years old. I’d been publishing papers on particle physics for a few years, and had gotten quite known around the international particle physics community (and, yes, it took decades to live down my teenage-particle-physicist persona). I was in England, but planned to soon go to graduate school in the US, and was choosing between Caltech and Princeton. And one weekend afternoon when I was about to go out, the phone rang. In those days, it was obvious if it was an international call. “This is Murray Gell-Mann”, the caller said, then launched into a monologue about why Caltech was the center of the universe for particle physics at the time.

This essay is also in:
Scientific American »

Perhaps not as starstruck as I should have been, I asked a few practical questions, which Murray dismissed. The call ended with something like, “Well, we’d like to have you at Caltech”.

A few months later I was indeed at Caltech. I remember the evening I arrived, wandering around the empty 4th floor of Lauritsen Lab—the home of Caltech theoretical particle physics. There were all sorts of names I recognized on office doors, and there were two offices that were obviously the largest: “M. Gell-Mann” and “R. Feynman”. (In between them was a small office labeled “H. Tuck”—which by the next day I’d realized was occupied by Helen Tuck, the lively longtime departmental assistant.)

There was a regular Friday lunch in the theoretical physics group, and as soon as a Friday came around, I met Murray Gell-Mann there. The first thing he said to me was, “It must be a culture shock coming here from England”. Then he looked me up and down. There I was in an unreasonably bright yellow shirt and sandals—looking, in fact, quite Californian. Murray seemed embarrassed, mumbled some pleasantry, then turned away.

With Murray at Caltech

I never worked directly with Murray (though he would later describe me to others as “our student”). But I interacted with him frequently while I was at Caltech. He was a strange mixture of gracious and gregarious, together with austere and combative. He had an expressive face, which would wrinkle up if he didn’t approve of what was being said.

Murray always had people and things he approved of, and ones he didn’t—to which he would often give disparaging nicknames. (He would always refer to solid-state physics as “squalid-state physics”.) Sometimes he would pretend that things he did not like simply did not exist. I remember once talking to him about something in quantum field theory called the beta function. His face showed no recognition of what I was talking about, and I was getting slightly exasperated. Eventually I blurted out, “But, Murray, didn’t you invent this?” “Oh”, he said, suddenly much more charming, “You mean g times the psi function. Why didn’t you just say that? Now I understand”. Of course, he had understood all along, but was being difficult about me using the “beta function” term, even though it had by then been standard for years.

I could never quite figure out what it was that made Murray impressed by some people and not others. He would routinely disparage physicists who were destined for great success, and would vigorously promote ones who didn’t seem so promising, and didn’t in fact do well. So when he promoted me, I was on the one hand flattered, but on the other hand concerned about what his endorsement might really mean.

The interaction between Murray Gell-Mann and Richard Feynman was an interesting thing to behold. Both came from New York, but Feynman relished his “working-class” New York accent, while Gell-Mann affected the best pronunciation of words from any language. Both would make surprisingly childish comments about the other.

I remember Feynman insisting on telling me the story of the origin of the word “quark”. He said he’d been talking to Murray one Friday about these hypothetical particles, and in their conversation they’d needed a name for them. Feynman told me he said (no doubt in his characteristic accent), “Let’s call them ‘quacks’”. The next Monday he said Murray came to him very excited and said he’d found the word “quark” in James Joyce. In telling this to me, Feynman then went into a long diatribe about how Murray always seemed to think the names for things were so important. “Having a name for something doesn’t tell you a damned thing”, Feynman said. (Having now spent so much of my life as a language designer, I might disagree). Feynman went on, mocking Murray’s concern for things like what different birds are called. (Murray was an avid bird watcher.)

Meanwhile, Feynman had worked on particles which seemed (and turned out to be) related to quarks. Feynman had called them “partons”. Murray insisted on always referring to them as “put-ons”.

Even though in terms of longstanding contributions to particle physics (if not physics in general) Murray was the clear winner, he always seemed to feel as if he was in the shadow of Feynman, particularly with Feynman’s showmanship. When Feynman died, Murray wrote a rather snarky obituary, saying of Feynman: “He surrounded himself with a cloud of myth, and he spent a great deal of time and energy generating anecdotes about himself”. I never quite understood why Murray—who could have gone to any university in the world—chose to work at Caltech for 33 years in an office two doors down from Feynman.

Murray cared a lot about what people thought of him, but would routinely (and maddeningly to watch) put himself in positions where he would look bad. He was very interested in—and I think very knowledgeable about—words and languages. And when he would meet someone, he would make a point of regaling them with information about the origin of their name (curiously—as I learned only years later—his own name, “Gell-Mann”, had been “upgraded” from “Gellmann”). Now, of course, if there’s one word people tend to know something about, it’s their own name. And, needless to say, Murray sometimes got its origins wrong—and was very embarrassed. (I remember he told a friend of mine named Nathan Isgur a long and elaborate story about the origin of the name “Isgur”, with Nathan eventually saying: “No, it was made up at Ellis Island!”.)

Murray wasn’t particularly good at reading other people. I remember in early 1982 sitting next to Murray in a limo in Chicago that had just picked up a bunch of scientists for some event. The driver was reading the names of the people he’d picked up over the radio. Many were complicated names, which the driver was admittedly butchering. But after each one, Murray would pipe up, and say “No, it’s said ____”. The driver was getting visibly annoyed, and eventually I said quietly to Murray that he should stop correcting him. When we arrived, Murray said to me: “Why did you say that?” He seemed upset that the driver didn’t care about getting the names right.

Occasionally I would ask Murray for advice, though he would rarely give it. When I was first working on one-dimensional cellular automata, I wanted to find a good name for them. (There had been several previous names for the 2D case, one of which—that I eventually settled on—was “cellular automata”.) I considered the name “polymones” (somehow reflecting Leibniz’s monad concept). But I asked Murray—given all his knowledge of words and languages—for a suggestion. He said he didn’t think polymones was much good, but didn’t have any other suggestion.

When I was working on SMP (a forerunner of Mathematica and the Wolfram Language) I asked Murray about it, though at the time I didn’t really understand as I do now the correspondences between human and computational languages. Murray was interested in trying out SMP, and had a computer terminal installed in his office. I kept on offering to show him some things, but he kept on putting it off. I later realized that—bizarrely to me—Murray was concerned about me seeing that he didn’t know how to type. (By the way, at the time, few people did—which is, for example, why SMP, like Unix, had cryptically short command names.)

But alongside the brush-offs and the strangeness, Murray could be personally very gracious. I remember him inviting me several times to his house. I never interacted with either of his kids (who were both not far from my age). But I did interact with his wife, Margaret, who was a very charming English woman. (As part of his dating advice to me, Feynman had explained that both he and Murray had married English women because “they could cope”.)

While I was at Caltech, Margaret got very sick with cancer, and Murray threw himself into trying to find a cure. (He blamed himself for not having made sure Margaret had had more checkups.) It wasn’t long before Margaret died. Murray invited me to the memorial service. But somehow I didn’t feel I could go; even though by then I was on the faculty at Caltech, I just felt too young and junior. I think Murray was upset I didn’t come, and I’ve felt guilty and embarrassed about it ever since.

Murray did me quite a few favors. He was an original board member of the MacArthur Foundation, and I think was instrumental in getting me a MacArthur Fellowship in the very first batch. Later, when I ran into trouble with intellectual property issues at Caltech, Murray went to bat for me—attempting to intercede with his longtime friend Murph Goldberger, who was by then president of Caltech (and who, before Caltech, had been a professor at Princeton, and had encouraged me to go to graduate school there).

I don’t know if I would call Murray a friend, though, for example, after Margaret died, he and I would sometimes have dinner together, at random restaurants around Pasadena. It wasn’t so much that I felt of a different generation from him (which of course I was). It was more that he exuded a certain aloof tension, that made one not feel very sure about what the relationship really was.

A Great Time in Physics

At the end of World War II, the Manhattan Project had just happened, the best and the brightest were going into physics, and “subatomic particles” were a major topic. Protons, neutrons, electrons and photons were known, and together with a couple of hypothesized particles (neutrinos and pions), it seemed possible that the story of elementary particles might be complete.

But then, first in cosmic rays, and later in particle accelerators, new particles started showing up. There was the muon, then the mesons (pions and kaons), and the hyperons (Λ, Σ, Ξ). All were unstable. The muon—which basically nobody understands even today—was like a heavy electron, interacting mainly through electromagnetic forces. But the others were subject to the strong nuclear force—the one that binds nuclei together. And it was observed that this force could generate these particles, though always together (Λ with K, for example). But, mysteriously, the particles could only decay through so-called weak interactions (of the kind involved in radioactive beta decay, or the decay of the muon).

For a while, nobody could figure out why this could be. But then around 1953, Murray Gell-Mann came up with an explanation. Just as particles have “quantum numbers” like spin and charge, he hypothesized that they could have a new quantum number that he called strangeness. Protons, neutrons and pions would have zero strangeness. But the Λ would have strangeness -1, the (positive) kaon strangeness +1, and so on. And total strangeness, he suggested, might be conserved in strong (and electromagnetic) interactions, but not in weak interactions. To suggest a fundamentally new property of particles was a bold thing to do. But it was correct: and immediately Murray was able to explain lots of things that had been observed.

But how did the weak interaction that was—among other things—responsible for the decay of Murray’s “strange particles” actually work? In 1957, in their one piece of collaboration in all their years together at Caltech, Feynman and Gell-Mann introduced the so-called V-A theory of the weak interaction—and, once again, despite initial experimental evidence to the contrary, it turned out to be correct. (The theory basically implies that neutrinos can only have left-handed helicity, and that weak interactions involve parity conservation and parity violation in equal amounts.)

As soon as the quantum mechanics of electrons and other particles was formulated in the 1920s, people started wondering about the quantum theory of fields, particularly the electromagnetic field. There were issues with infinities, but in the late 1940s—in Feynman’s big contribution—these were handled through the concept of renormalization. The result was that it was possible to start computing things using quantum electrodynamics (QED)—and soon all sorts of spectacular agreements with experiment had been found.

But all these computations worked by looking at just the first few terms in a series expansion in powers of the interaction strength parameter α≃1/137. In 1954, during his brief time at the University of Illinois (from which he went to the University of Chicago, and then Caltech), Murray, together with Francis Low, wrote a paper entitled “Quantum Electrodynamics at Small Distances” which was an attempt to explore QED to all orders in α. In many ways this paper was ahead of its time—and 20 years later, the “renormalization group” that it implicitly defined became very important (and the psi function that it discussed was replaced by the beta function).

While QED could be investigated through a series expansion in the small parameter α≃1/137, no such program seemed possible for the strong interaction (where the effective expansion parameter would be ≃1). So in the 1950s there was an attempt to take a more holistic approach, based on looking at the whole so-called S-matrix defining overall scattering amplitudes. Various properties of the S-matrix were known—notably analyticity with respect to values of particle momenta, and so-called crossing symmetry associated with exchanging particles and antiparticles.

But were these sufficient to understand the properties of strong interactions? Throughout the 1960s, attempts involving more and more elaborate mathematics were made. But things kept on going wrong. The proton-proton total interaction probability was supposed to rise with energy. But experimentally it was seen to level off. So a new idea (the pomeron) was introduced. But then the interaction probability was found to start rising again. So another phenomenon (multiparticle “cuts”) had to be introduced. And so on. (Ironically enough, early string theory spun off from these attempts—and today, after decades of disuse, S-matrix theory is coming back into vogue.)

But meanwhile, there was another direction being explored—in which Murray Gell-Mann was centrally involved. It all had to do with the group-theory-meets-calculus concept of Lie groups. An example of a Lie group is the 3D rotation group, known in Lie group theory as SO(3). A central issue in Lie group theory is to find representations of groups: finite collections, say of matrices, that operate like elements of the group.

Representations of the rotation group had been used in atomic physics to deduce from rotational symmetry a characterization of possible spectral lines. But what Gell-Mann did was to say, in effect, “Let’s just imagine that in the world of elementary particles there’s some kind of internal symmetry associated with the Lie group SU(3). Now use representation theory to characterize what particles will exist”.

And in 1961, he published his eightfold way (named after Buddha’s Eightfold Way) in which he proposed—periodic-table style—that there should be 8+1 types of mesons, and 10+8 types of baryons (hyperons plus nucleons, such as proton and neutron). For the physics of the time, the mathematics involved in this was quite exotic. But the known particles organized nicely into Gell-Mann’s structure. And Gell-Mann made a prediction: that there should be one additional type of hyperon, that he called the , with strangeness -3, and certain mass and decay characteristics.

And—sure enough—in 1964, the was observed, and Gell-Mann was on his way to the Nobel Prize, which he received in 1969.

At first the SU(3) symmetry idea was just about what particles should exist. But Gell-Mann wanted also to characterize interactions associated with particles, and for this he introduced what he called current algebra. And, by 1964, from his work on current algebra, he’d realized something else: that his SU(3) symmetry could be interpreted as meaning that things like protons were actually composed of something more fundamental—that he called quarks.

What exactly were the quarks? In his first paper on the subject, Gell-Mann called them “mathematical entities”, although he admitted that, just maybe, they could actually be particles themselves. There were problems with this, though. First, it was thought that electric charge was quantized in units of the electron charge, but quarks would have to have charges of 2/3 and -1/3. But even more seriously, one would have to explain why no free quarks had ever been seen.

It so happened that right when Gell-Mann was writing this, a student at Caltech named George Zweig was thinking of something very similar. Zweig (who was at the time visiting CERN) took a mathematically less elaborate approach, observing that the existing particles could be explained as built from three kinds of “aces”, as he called them, with the same properties as Gell-Mann’s quarks.

Zweig became a professor at Caltech—and I’ve personally been friends with him for more than 40 years. But he never got as much credit for his aces idea as he should (though in 1977 Feynman proposed him for a Nobel Prize), and after a few years he left particle physics and started studying the neurobiology of the ear—and now, in his eighties, has started a quant hedge fund.

Meanwhile, Gell-Mann continued pursuing the theory of quarks, refining his ideas about current algebras. But starting in 1968, there was something new: particle accelerators able to collide high-energy electrons with protons (“deep inelastic scattering”) observed that sometimes the electrons could suffer large deflections. There were lots of details, particularly associated with relativistic kinematics, but in 1969 Feynman proposed his parton (or, as Gell-Mann called it, “put-on”) model, in which the proton contained point-like “parton” particles.

It was immediately guessed that partons might be quarks, and within a couple of years this had been established. But the question remained of why the quarks should be confined inside particles such as protons. To avoid some inconsistencies associated with the exclusion principle, it had already been suggested that quarks might come in three “colors”. Then in 1973, Gell-Mann and his collaborators suggested that associated with these colors, quarks might have “color charges” analogous to electric charge.

Electromagnetism can be thought of as a gauge field theory associated with the Lie group U(1). Now Gell-Mann suggested that there might be a gauge field theory associated with an SU(3) color group (yes, SU(3) again, but a different application than in the eightfold way, etc.). This theory became known as quantum chromodynamics, or QCD. And, in analogy to the photon, it involves particles called gluons.

Unlike photons, however, gluons directly interact with each other, leading to a much more complex theory. But in direct analogy to Gell-Mann and Low’s 1954 renormalization group computation for QED, in 1973 the beta function (AKA g times psi function) for QCD was computed, and was found to show the phenomenon of asymptotic freedom—essentially that QCD interactions get progressively weaker at shorter distances.

This immediately explained the success of the parton model, but also suggested that if quarks get further apart, the QCD interactions between them get stronger, potentially explaining confinement. (And, yes, this is surely the correct intuition about confinement, although even to this day, there is no formal proof of quark confinement—and I suspect it may have issues of undecidability.)

Through much of the 1960s, S-matrix theory had been the dominant approach to particle physics. But it was having trouble, and the discovery of asymptotic freedom in QCD in 1973 brought field theory back to the fore, and, with it, lots of optimism about what might be possible in particle physics.

Murray Gell-Mann had had an amazing run. For 20 years he had made a series of bold conjectures about how nature might work—strangeness, V-A theory, SU(3), quarks, QCD—and in each case he had been correct, while others had been wrong. He had had one of the more remarkable records of repeated correct intuition in the whole history of science.

He tried to go on. He talked about “grand unification being in the air”, and (along with many other physicists) discussed the possibility that QCD and the theory of weak interactions might be unified in models based on groups like SU(5) and SO(10). He considered supersymmetry—in which there would be particles that are crosses between things like neutrinos and things like gluons. But quick validations of these theories didn’t work out—though even now it’s still conceivable that some version of them might be correct.

But regardless, the mid-1970s were a period of intense activity for particle physics. In 1974, the
J/ψ particle was discovered, which turned out to be associated with a fourth kind of quark (charm quark). In 1978, evidence of a fifth quark was seen. Lots was figured out about how QCD works. And a consistent theory of weak interactions emerged that, together with QED and QCD, defined what by the early 1980s had become the modern Standard Model of particle physics that exists today.

I myself got seriously interested in particle physics in 1972, when I was 12 years old. I used to carry around a copy of the little Particle Properties booklet—and all the various kinds of particles became, in a sense, my personal friends. I knew by heart the mass of the Λ, the lifetime of the , and a zillion other things about particles. (And, yes, amazingly, I still seem to remember almost all of them—though now they’re all known to much greater accuracy.)

At the time, it seemed to me like the most important discoveries ever were being made: fundamental facts about the fundamental particles that exist in our universe. And I think I assumed that before long everyone would know these things, just as people know that there are atoms and protons and electrons.

But I’m shocked today that almost nobody has, for example, even heard of muons—even though we’re continually bombarded with them from cosmic rays. Talk about strangeness, or the omega-minus, and one gets blank stares. Quarks more people have heard of, though mostly because of their name, with its various uses for brands, etc.

To me it feels a bit tragic. It’s not hard to show Gell-Mann’s eightfold way pictures, and to explain how the particles in them can be made from quarks. It’s at least as easy to explain that there are 6 known types of quarks as to explain about chemical elements or DNA bases. But for some reason—in most countries—all these triumphs of particle physics have never made it into school science curriculums.

And as I was writing this piece, I was shocked at how thin the information on “classic” particle physics is on the web. In fact, in trying to recall some of the history, the most extensive discussion I could find was in an unpublished book I myself wrote when I was 12 years old! (Yes, full of charming spelling mistakes, and a few physics mistakes.)

The Rest of the Story

When I first met Murray in 1978, his great run of intuition successes and his time defining almost everything that was important in particle physics was already behind him. I was never quite sure what he spent his time on. I know he traveled a lot, using physics meetings in far-flung places as excuses to absorb local culture and nature. I know he spent significant time with the JASON physicists-consult-for-the-military-and-get-paid-well-for-doing-so group. (It was a group that also tried to recruit me in the mid-1980s.) I know he taught classes at Caltech—though he had a reputation for being rather disorganized and unprepared, and I often saw him hurrying to class with giant piles of poorly collated handwritten notes.

Quite often I would see him huddled with more junior physicists that he had brought to Caltech with various temporary jobs. Often there were calculations being done on the blackboard, sometimes by Murray. Lots of algebra, usually festooned with tensor indices—with rarely a diagram in sight. What was it about? I think in those days it was most often supergravity—a merger of the idea of supersymmetry with an early form of string theory (itself derived from much earlier work on S-matrix theory).

This was the time when QCD, quark models and lots of other things that Murray had basically created were at their hottest. Yet Murray chose not to work on them—for example telling me after hearing a talk I gave on QCD that I should work on more worthwhile topics.

I’m guessing Murray somehow thought that his amazing run of intuition would continue, and that his new theories would be as successful as his old. But it didn’t work out that way. Though when I would see Murray, he would often tell me of some amazing physics that he was just about to crack, often using elaborate mathematical formalism that I didn’t recognize.

By the time I left Caltech in 1983, Murray was spending much of his time in New Mexico, around Santa Fe and Los Alamos—particularly getting involved in what would become the Santa Fe Institute. In 1984, I was invited to the inaugural workshop discussing what was then called the Rio Grande Institute might do. It was a strange event, at which I was by far the youngest participant. And as chance would have it, in connection with the republication of the proceedings of that event, I just recently wrote an account of what happened there, which I will soon post.

But in any case, Murray was co-chairing the event, and talking about his vision for a great interdisciplinary university, in which people would study things like the relations between physics and archaeology. He talked in grand flourishes about covering the arts and sciences, the simple and the complex, and linking them all together. It didn’t seem very practical to me—and at some point I asked what the Santa Fe Institute would actually concentrate on if it had to make a choice.

People asked what I would suggest, and I (somewhat reluctantly, because it seemed like everyone had been trying to promote their pet area) suggested “complex systems theory”, and my ideas about the emergence of complexity from things like simple programs. The audio of the event records some respectful exchanges between Murray and me, though more about organizational matters than content. But as it turned out, complex systems theory was indeed what the Santa Fe Institute ended up concentrating on. And Murray himself began to use “complexity” as a label for things he was thinking about.

I tried for years (starting when I first worked on such things, in 1981) to explain to Murray about cellular automata, and about my explorations of the computational universe. He would listen politely, and pay lip service to the relevance of computers and experiments with them. But—as I later realized—he never really understood much at all of what I was talking about.

By the late 1980s, I saw Murray only very rarely. I heard, though, that through an agent I know, Murray had got a big advance to write a book. Murray always found writing painful, and before long I heard that the book had gone through multiple editors (and publishers), and that Murray thought it responsible for a heart attack he had. I had hoped that the book would be an autobiography, though I suspected that Murray might not have the introspection to produce that. (Several years later, a New York Times writer named George Johnson wrote what I considered a very good biography of Murray, which Murray hated.)

But then I heard that Murray’s book was actually going to be about his theory of complexity, whatever that might be. A few years went by, and, eventually, in 1994, to rather modest fanfare, Murray’s book The Quark and the Jaguar appeared. Looking through it, though, it didn’t seem to contain anything concrete that could be considered a theory of complexity. George Zweig told me he’d heard that Murray had left people like me and him out of the index to the book, so we’d have to read the whole book if we wanted to find out what he said about us.

At the time, I didn’t bother. But just now, in writing this piece, I was curious to find out what, if anything, Murray actually did say about me. In the printed book, the index goes straight from “Winos” to Woolfenden. But online I can find that there I am, on page 77 (and, bizarrely, I’m also in the online index): “As Stephen Wolfram has emphasized, [a theory] is a compressed package of information, applicable to many cases”. Yes, that’s true, but is that really all Murray got out of everything I told him? (George Zweig, by the way, isn’t mentioned in the book at all.)

In 2002, I’d finally finished my own decade-long basic science project, and I was getting ready to publish my book A New Kind of Science. In recognition of his early support, I’d mentioned Murray in my long list of acknowledgements in the book, and I thought I’d reach out to him and see if he’d like to write a back-cover blurb. (In the end, Steve Jobs convinced me not to have any back-cover blurbs: “Isaac Newton didn’t have blurbs on the Principia; nor should you on your book”.)

Murray responded politely: “It is exciting to know that your magnum opus, reflecting so much thought, research, and writing, will finally appear. I should, of course, be delighted to receive the book and peruse it, and I might be able to come up with an endorsement, especially since I expect to be impressed”. But he said, “I find it difficult to write things under any conditions, as you probably know”.

I sent Murray the book, and soon thereafter was on the phone with him. It was a strange and contentious conversation. Murray was obviously uncomfortable. I was asking him about what he thought complexity was. He said it was “like a child learning a language”. I asked what that meant. We went back and forth talking about languages. I had the distinct sense that Murray thought he could somehow blind me with facts I didn’t know. But—perhaps unfortunately for the conversation—even though A New Kind of Science doesn’t discuss languages much, my long efforts in computational language design had made me quite knowledgeable about the topic, and in the conversation I made it quite clear that I wasn’t convinced about what Murray had to say.

Murray followed up with an email: “It was good to talk with you. I found the exchange of ideas very interesting. We seem to have been thinking about many of the same things over the last few years, and apparently we agree on some of them and have quite divergent views on others”. He talked about the book, saying that “Obviously, I can’t, in a brief perusal, come to any deep conclusions about such an impressive tome. It is clear, however, that there are many ideas in it with which, if I understand them correctly, I disagree”.

Then he continued: “Also, my own work of the last decade or so is not mentioned anywhere, even though that work includes discussions of the meaning and significance of simplicity and complexity, the role of decoherent histories in the understanding of quantum mechanics, and other topics that play important roles in A New Kind of Science”. (Actually, I don’t think I discussed anything relevant to decoherent histories in quantum mechanics.) He explained that he didn’t want to write a blurb, and ended: “I’m sorry, and I hope that this matter does not present any threat to our friendship, which I hold dear”.

As it turned out, I never talked to Murray about science again. The last time I saw Murray was in 2012 at a peculiar event in New York City for promising high-school students. I said hello. Murray looked blank. I said my name, and held up my name tag. “Do I know you?”, he said. I repeated my name. Still blank. I couldn’t tell if it was a problem of age—or a repeat of the story of the beta function. But, with regret, I walked away.

I have often used Murray as an example of the challenges of managing the arc of a great career. From his twenties to his forties, Murray had the golden touch. His particular way of thinking had success after success, and in many ways, he defined physics for a generation. But by the time I knew him, the easy successes were over. Perhaps it was Murray; more likely, it was just that the easy pickings from his approach were now gone.

I think Murray always wanted to be respected as a scholar and statesman of science—and beyond. But—to his chagrin—he kept on putting himself in situations that played to his weaknesses. He tried to lead people, but usually ended up annoying them. He tried to become a literary-style author, but his perfectionism and insecurity got in the way. He tried to do important work in new fields, but ended up finding that his particular methods didn’t work there. To me, it felt in many ways tragic. He so wanted to succeed as he had before, but he never found a way to do it—and always bore the burden of his early success.

Still, with all his complexities, I am pleased to have known Murray. And though Murray is now gone, the physics he discovered will live on, defining an important chapter in the quest for our understanding of the fundamental structure of our universe.

]]> 8
<![CDATA[Launching Today: Free Wolfram Engine for Developers]]> Tue, 21 May 2019 14:10:15 +0000 Stephen Wolfram Wolfram EngineWhy Aren’t You Using Our Technology? It happens far too often. I’ll be talking to a software developer, and they’ll be saying how great they think our technology is, and how it helped them so much in school, or in doing R&D. But then I’ll ask them, “So, are you using Wolfram Language and its [...]]]> Wolfram Engine

Why Aren’t You Using Our Technology?

It happens far too often. I’ll be talking to a software developer, and they’ll be saying how great they think our technology is, and how it helped them so much in school, or in doing R&D. But then I’ll ask them, “So, are you using Wolfram Language and its computational intelligence in your production software system?” Sometimes the answer is yes. But too often, there’s an awkward silence, and then they’ll say, “Well, no. Could I?”

Free Wolfram Engine for DevelopersI want to make sure the answer to this can always be: “Yes, it’s easy!” And to help achieve that, we’re releasing today the Free Wolfram Engine for Developers. It’s a full engine for the Wolfram Language, that can be deployed on any system—and called from programs, languages, web servers, or anything.

The Wolfram Engine is the heart of all our products. It’s what implements the Wolfram Language, with all its computational intelligence, algorithms, knowledgebase, and so on. It’s what powers our desktop products (including Mathematica), as well as our cloud platform. It’s what’s inside Wolfram|Alpha—as well as an increasing number of major production systems out in the world. And as of today, we’re making it available for anyone to download, for free, to use in their software development projects.

The Wolfram Language

Many people know the Wolfram Language (often in the form of Mathematica) as a powerful system for interactive computing—and for doing R&D, education, data science and “computational X” for many X. But increasingly it’s also being used “behind the scenes” as a key component in building production software systems. And what the Free Wolfram Engine for Developers now does is to package it so it’s convenient to insert into a whole range of software engineering environments and projects.

It’s worth explaining a bit about how I see the Wolfram Language these days. (By the way, you can run it immediately on the web in the Wolfram Language Sandbox.) The most important thing is to realize that the Wolfram Language as it now exists is really a new kind of thing: a full-scale computational language. Yes, it’s an extremely powerful and productive (symbolic, functional, …) programming language. But it’s much more than that. Because it’s got the unique feature of having a huge amount of computational knowledge built right into it: knowledge about algorithms, knowledge about the real world, knowledge about how to automate things.

We’ve been steadily building up what’s now the Wolfram Language for more than 30 years—and one thing I’m particularly proud of (though it’s hard work; e.g. check out the livestreams!) is how uniform, elegant and stable a design we’ve been able to maintain across the whole language. There are now altogether 5000+ functions in the language, covering everything from visualization to machine learning, numerics, image computation, geometry, higher math and natural language understanding—as well as lots of areas of real-world knowledge (geo, medical, cultural, engineering, scientific, etc.).

In recent years, we’ve also introduced lots of hardcore software engineering capabilities—instant cloud deployment, network programming, web interaction, database connectivity, import/export (200+ formats), process control, unit testing, report generation, cryptography, blockchain, etc. (The symbolic nature of the language makes these particularly clean and powerful.)

The goal of the Wolfram Language is simple, if ambitious: have everything be right there, in the language, and be as automatic as possible. Need to analyze an image? Need geographic data? Audio processing? Solve an optimization problem? Weather information? Generate 3D geometry? Anatomical data? NLP entity identification? Find anomalies in a time series? Send a mail message? Get a digital signature? All these things (and many, many more) are just functions that you can immediately call in any program you write in Wolfram Language. (There are no libraries to hunt down; everything is just integrated into the language.)

Back on the earliest computers, all one had was machine code. But then came simple programming languages. And soon one could also take it for granted that one’s computer would have an operating system. Later also networking, then a user interface, then web connectivity. My goal with the Wolfram Language is to provide a layer of computational intelligence that in effect encapsulates the computational knowledge of our civilization, and lets people take it for granted that their computer will know how to identify objects in an image, or how to solve equations, or what the populations of cities are, or countless other things.

And now, today, what we want to do with the Free Wolfram Engine for Developers is to make this something ubiquitous, and immediately available to any software developer.

The Wolfram Engine

The Free Wolfram Engine for Developers implements the full Wolfram Language as a software component that can immediately be plugged into any standard software engineering stack. It runs on any standard platform (Linux, Mac, Windows, RasPi, …; desktop, server, virtualized, distributed, parallelized, embedded). You can use it directly with a script, or from a command line. You can call it from programming languages (Python, Java, .NET, C/C++, …), or from other systems (Excel, Jupyter, Unity, Rhino, …). You can call it through sockets, ZeroMQ, MQTT or its own native WSTP (Wolfram Symbolic Transfer Protocol). It reads and writes hundreds of formats (CSV, JSON, XML, …), and connects to databases (SQL, RDF/SPARQL, Mongo, …), and can call external programs (executables, libraries, …), browsers, mail servers, APIs, devices, and languages (Python, NodeJS, Java, .NET, R, …). Soon it’ll also plug directly into web servers (J2EE, aiohttp, Django, …). And you can edit and manage your Wolfram Language code with standard IDEs, editors and tools (Eclipse, IntelliJ IDEA, Atom, Vim, Visual Studio Code, Git, …).

The Free Wolfram Engine for Developers has access to the whole Wolfram Knowledgebase, through a free Basic subscription to the Wolfram Cloud. (Unless you want real-time data, everything can be cached, so you can run the Wolfram Engine without network connectivity.) The Basic subscription to the Wolfram Cloud also lets you deploy limited APIs in the cloud.

A key feature of the Wolfram Language is that you can run the exact same code anywhere. You can run it interactively using Wolfram Notebooks—on the desktop, in the cloud, and on mobile. You can run it in a cloud API (or scheduled task, etc.), on the public Wolfram Cloud, or in a Wolfram Enterprise Private Cloud. And now, with the Wolfram Engine, you can also easily run it deep inside any standard software engineering stack.

(Of course, if you want to use our whole hyperarchitecture spanning desktop, server, cloud, parallel, embedded, mobile—and interactive, development and production computing—then a good entry point is Wolfram|One, and, yes, there are trial versions available.)

Going into Production

OK, so how does the licensing for Free Wolfram Engine for Developers work? For the past 30+ years, our company has had a very straightforward model: we license our software to generate revenue that allows us to continue our long-term mission of continuous, energetic R&D. We’ve also made many important things available for free—like our main Wolfram|Alpha website, Wolfram Player and basic access to the Wolfram Cloud.

The Free Wolfram Engine for Developers is intended for use in pre-production software development. You can use it to develop a product for yourself or your company. You can use it to conduct personal projects at home, at school or at work. And you can use it to explore the Wolfram Language for future production projects. (Here’s the actual license, if you’re curious.)

When you have a system ready to go into production, then you get a Production License for the Wolfram Engine. Exactly how that works will depend on what kind of system you’ve built. There are options for local individual or enterprise deployment, for distributing the Wolfram Engine with software or hardware, for deploying in cloud computing platforms—and for deploying in the Wolfram Cloud or Wolfram Enterprise Private Cloud.

If you’re making a free, open-source system, you can apply for a Free Production License. Also, if you’re part of a Wolfram Site License (of the type that, for example, most universities have), then you can freely use Free Wolfram Engine for Developers for anything that license permits.

We haven’t worked out all the corners and details of every possible use of the Wolfram Engine. But we are committed to providing predictable and straightforward licensing for the long term (and we’re working to ensure the availability and vitality of the Wolfram Language in perpetuity, independent of our company). We’ve now had consistent pricing for our products for 30+ years, and we want to stay as far away as possible from the many variants of bait-and-switch which have become all too prevalent in modern software licensing.

So Use It!

I’m very proud of what we’ve created with Wolfram Language, and it’s been wonderful to see all the inventions, discoveries and education that have happened with it over decades. But in recent years there’s been a new frontier: the increasingly widespread use of the Wolfram Language inside large-scale software projects. Sometimes the whole project is built in Wolfram Language. Sometimes Wolfram Language is inserted to add some critical computational intelligence, perhaps even just in a corner of the project.

The goal of the Free Wolfram Engine for Developers is to make it easy for anyone to use the Wolfram Language in any software development project—and to build systems that take advantage of its computational intelligence.

We’ve worked hard to make the Free Wolfram Engine for Developers as easy to use and deploy as possible. But if there’s something that doesn’t work for you or your project, please send me mail! Otherwise, please use what we’ve built—and do something great with it!

]]> 4
<![CDATA[Wolfram|Alpha at 10]]> Sat, 18 May 2019 16:00:10 +0000 Stephen Wolfram Wolfram|alpha at 10The Wolfram|Alpha Story Today it’s 10 years since we launched Wolfram|Alpha. At some level, Wolfram|Alpha is a never-ending project. But it’s had a great first 10 years. It was a unique and surprising achievement when it first arrived, and over its first decade it’s become ever stronger and more unique. It’s found its way into [...]]]> Wolfram|alpha at 10

Wolfram|Alpha at 10

The Wolfram|Alpha Story

Today it’s 10 years since we launched Wolfram|Alpha. At some level, Wolfram|Alpha is a never-ending project. But it’s had a great first 10 years. It was a unique and surprising achievement when it first arrived, and over its first decade it’s become ever stronger and more unique. It’s found its way into more and more of the fabric of the computational world, both realizing some of the long-term aspirations of artificial intelligence, and defining new directions for what one can expect to be possible. Oh, and by now, a significant fraction of a billion people have used it. And we’ve been able to keep it private and independent, and its main website has stayed free and without external advertising.

Wolfram|Alpha home page

For me personally, the vision that became Wolfram|Alpha has a very long history. I first imagined creating something like it more than 47 years ago, when I was about 12 years old. Over the years, I built some powerful tools—most importantly the core of what’s now Wolfram Language. But it was only after some discoveries I made in basic science in the 1990s that I felt emboldened to actually try building what’s now Wolfram|Alpha.

It was—and still is—a daunting project. To take all areas of systematic knowledge and make them computable. To make it so that any question that can in principle be answered from knowledge accumulated by our civilization can actually be answered, immediately and automatically.

Leibniz had talked about something like this 350 years ago; Turing 70 years ago. But while science fiction (think the Star Trek computer) had imagined it, and AI research had set it as a key goal, 50 years of actual work on question-answering had failed to deliver. And I didn’t know for sure if we were in the right decade—or even the right century—to be able to build what I wanted.

But I decided to try. And it took lots of ideas, lots of engineering, lots of diverse scholarship, and lots of input from experts in a zillion fields. But by late 2008 we’d managed to get Wolfram|Alpha to the point where it was beginning to work. Day by day we were making it stronger. But eventually there was no sense in going further until we could see how people would actually use it.

And so it was that on May 18, 2009, we officially opened Wolfram|Alpha up to the world. And within hours we knew it: Wolfram|Alpha really worked! People asked all kinds of questions, and got successful answers. And it became clear that the paradigm we’d invented of generating synthesized reports from natural language input by using built-in computational knowledge was very powerful, and was just what people needed.

Perhaps because the web interface to Wolfram|Alpha was just a simple input field, some people assumed it was like a search engine, finding content on the web. But Wolfram|Alpha isn’t searching anything; it’s computing custom answers to each particular question it’s asked, using its own built-in computational knowledge—that we’ve spent decades amassing. And indeed, quite soon, it became clear that the vast majority of questions people were asking were ones that simply didn’t have answers already written down anywhere on the web; they were questions whose answers had to be computed, using all those methods and models and algorithms—and all that curated data—that we’d so carefully put into Wolfram|Alpha.

As the years have gone by, Wolfram|Alpha has found its way into intelligent assistants like Siri, and now also Alexa. It’s become part of chatbots, tutoring systems, smart TVs, NASA websites, smart OCR apps, talking (toy) dinosaurs, smart contract oracles, and more. It’s been used by an immense range of people, for all sorts of purposes. Inventors have used it to figure out what might be possible. Leaders and policymakers have used it to make decisions. Professionals have used it to do their jobs every day. People around the world have used it to satisfy their curiosity about all sorts of peculiar things. And countless students have used it to solve problems, and learn.

And in addition to the main, public Wolfram|Alpha, there are now all sorts of custom “enterprise” Wolfram|Alphas operating inside large organizations, answering questions using not only public data and knowledge, but also the internal data and knowledge of those organizations.

It’s fun when I run into high-school and college kids who notice my name and ask “Are you related to Wolfram|Alpha?” “Well”, I say, “actually, I am”. And usually there’s a look of surprise, and a slow dawning of the concept that, yes, Wolfram|Alpha hasn’t always existed: it had to be created, and there was an actual human behind it. And then I often explain that actually I first started thinking about building it a long time ago, when I was even younger than them…

How Come It Actually Worked?

Why Wolfram|Alpha works

When I started building Wolfram|Alpha I certainly couldn’t prove it would work. But looking back, I realize there were a collection of key things—mostly quite unique to us and our company—that ultimately made it possible. Some were technical, some were conceptual, and some were organizational.

On the technical side, the most important was that we had what was then Mathematica, but is now the Wolfram Language. And by the time we started building Wolfram|Alpha, it was clear that the unique symbolic programming paradigm that we’d invented to be the core of the Wolfram Language was incredibly general and powerful—and could plausibly succeed at the daunting task of providing a way to represent all the computational knowledge in the world.

It also helped a lot that there was so much algorithmic knowledge already built into the system. Need to solve a differential equation to compute a trajectory? Just use the built-in NDSolve function! Need to solve a difficult recurrence relation? Just use RSolve. Need to simplify a piece of logic? Use BooleanMinimize. Need to do the combinatorial optimization of finding the smallest number of coins to give change? Use FrobeniusSolve. Need to find out how long to cook a turkey of a certain weight? Use DSolve. Need to find the implied volatility of a financial derivative? Use FinancialDerivative. And so on.

But what about all that actual data about the world? All the information about cities and movies and food and so on? People might have thought we’d just be able to forage the web for it. But I knew very quickly this wouldn’t work: the data—if it even existed on the web—wouldn’t be systematic and structured enough for us to be able to correctly do actual computations from it, rather than just, for example, displaying it.

So this meant there wouldn’t be any choice but to actually dive in and carefully deal with each different kind of data. And though I didn’t realize it with so much clarity at the time, this is where our company had another extremely rare and absolutely crucial advantage. We’ve always been a very intellectual company (no doubt to our commercial detriment)—and among our staff we, for example, have PhDs in a wide range of subjects, from chemistry to history to neuroscience to architecture to astrophysics. But more than that, among the enthusiastic users of our products we count many of the world’s top researchers across a remarkable diversity of fields.

So when we needed to know about proteins or earthquakes or art history or whatever, it was easy for us to find an expert. At first, I thought the main issue would just be “Where is the best source of the relevant data?” Sometimes that source would be very obvious; sometimes it would be very obscure. (And, yes, it was always fun to run across people who’d excitedly say things like: “Wow, we’ve been collecting this data for decades and nobody’s ever asked for it before!”)

But I soon realized that having raw data was only the beginning; after that came the whole process of understanding it. What units are those quantities in? Does -99 mean that data point is missing? How exactly is that average defined? What is the common name for that? Are those bins mutually exclusive or combined? And so on. It wasn’t just enough to have the data; one also had to have an expert-level dialog with whomever had collected the data.

But then there was another issue: people want answers to questions, not raw data. It’s all well and good to know the orbital parameters for a television satellite, but what most people will actually want to know is where the satellite is in the sky at their location. And to work out something like that requires some method or model or algorithm. And this is where experts were again crucial.

My goal from the beginning was always to get the best research-level results for everything. I didn’t consider it good enough to use the simple formula or the rule of thumb. I wanted to get the best answers that current knowledge could give—whether it was for time to sunburn, pressure in the ocean, mortality curves, tree growth, redshifts in the early universe, or whatever. Of course, the good news was that the Wolfram Language almost always had the built-in algorithmic power to do whatever computations were needed. And it was remarkably common to find that the original research we were using had actually been done with the Wolfram Language.

As we began to develop Wolfram|Alpha we dealt with more and more domains of data, and more and more cross-connections between them. We started building streamlined frameworks for doing this. But one of the continuing features of the Wolfram|Alpha project has been that however good the frameworks are, every new area always seems to involve new and different twists—that can be successfully handled only because we’re ultimately using the Wolfram Language, with all its generality.

Over the years, we’ve developed an elaborate art of data curation. It’s a mixture of automation (these days, often using modern machine learning), management processes, and pure human tender loving care applied to data. I have a principle that there always has to be an expert involved—or you’ll never get the right answer. But it’s always complicated to allocate resources and to communicate correctly across the phases of data curation—and to inject the right level of judgement at the right points. (And, yes, in an effort to make the complexities of the world conveniently amenable to computation, there are inevitably judgement calls involved: “Should the Great Pyramid be considered a building?”, “Should Lassie be considered a notable organism or a fictional character?” “What was the occupation of Joan of Arc?”, and so on.)

When we started building Wolfram|Alpha, there’d already been all sorts of thinking about how large-scale knowledge should best be represented computationally. And there was a sense that—much like logic was seen as somehow universally applicable—so also there should be a universal and systematically structured way to represent knowledge. People had thought about ideas based on set theory, graph theory, predicate logic, and more—and each had had some success.

Meanwhile, I was no stranger to global approaches to things—having just finished a decade of work on my book A New Kind of Science, which at some level can be seen as being about the theory of all possible theories. But partly because of the actual science I discovered (particularly the idea of computational irreducibility), and partly because of the general intuition I had developed, I had what I now realize was a crucial insight: there’s not going to be a useful general theory of how to represent knowledge; the best you can ever ultimately do is to think of everything in terms of arbitrary computation.

And the result of this was that when we started developing Wolfram|Alpha, we began by just building up each domain “from its computational roots”. Gradually, we did find and exploit all sorts of powerful commonalities. But it’s been crucial that we’ve never been stuck having to fit all knowledge into a “data ontology graph” or indeed any fixed structure. And that’s a large part of why we’ve successfully been able to make use of all the rich algorithmic knowledge about the world that, for example, the exact sciences have delivered.

The Challenge of Natural Language

Perhaps the most obviously AI-like part of my vision for Wolfram|Alpha was that you should be able to ask it questions purely in natural language. When we started building Wolfram|Alpha there was already a long tradition of text retrieval (from which search engines had emerged), as well as of natural language processing and computational linguistics. But although these all dealt with natural language, they weren’t trying to solve the same problem as Wolfram|Alpha. Because basically they were all taking existing text, and trying to extract from it things one wanted. In Wolfram|Alpha, what we needed was to be able to take questions given in natural language, and somehow really understand them, so we could compute answers to them.

In the past, exactly what it meant for a computer to “understand” something had always been a bit muddled. But what was crucial for the Wolfram|Alpha project was that we were finally in a position to give a useful, practical definition: “understanding” for us meant translating the natural language into precise Wolfram Language. So, for example, if a user entered “What was the gdp of france in 1975?” we wanted to interpret this as the Wolfram Language symbolic expression Entity["Country", "France"][Dated["GDP", 1975]].

And while it was certainly nice to have a precise representation of a question like that, the real kicker was that this representation was immediately computable: we could immediately use it to actually compute an answer.

In the past, a bane of natural language understanding had always been the ambiguity of things like words in natural language. When you say “apple”, do you mean the fruit or the company? When you say “3 o’clock”, do you mean morning or afternoon? On which day? When you say “springfield”, do you mean “Springfield, MA” or one of the 28 other possible Springfield cities?

But somehow, in Wolfram|Alpha this wasn’t such a problem. And it quickly became clear that the reason was that we had something that no previous attempt at natural language understanding had ever had: we had a huge and computable knowledgebase about the world. So “apple” wasn’t just a word for us: we had extensive data about the properties of apples as fruit and Apple as a company. And we could immediately tell that “apple vitamin C” was talking about the fruit, “apple net income” about the company, and so on. And for “Springfield” we had data about the location and population and notoriety of every Springfield. And so on.

It’s an interesting case where things were made easier by solving a much larger problem: we could be successful at natural language understanding because we were also solving the huge problem of having broad and computable knowledge about the world. And also because we had built the whole symbolic language structure of the Wolfram Language.

There were still many issues, however. At first, I’d wondered if traditional grammar and computational linguistics would be useful. But they didn’t apply well to the often-not-very-grammatical inputs people actually gave. And we soon realized that instead, the basic science I’d done in A New Kind of Science could be helpful—because it gave a conceptual framework for thinking about the interaction of many different simple rules operating on a piece of natural language.

And so we added the strange new job title of “linguistic curator”, and set about effectively curating the semantic structure of natural language, and creating a practical way to turn natural language into precise Wolfram Language. (And, yes, what we did might shed light on how humans understand language—but we’ve been so busy building technology that we’ve never had a chance to explore this.)

How to Answer the Question

OK, so we can solve the difficult problem of taking natural language and turning it into Wolfram Language. And with great effort we’ve got all sorts of knowledge about the world, and we can compute all kinds of things from it. But given a particular input, what output should we actually generate? Yes, there may be a direct answer to a question (“42”, “yes”, whatever). And in certain circumstances (like voice output) that may be the main thing you want. But particularly when visual display is possible, we quickly discovered that people find richer outputs dramatically more valuable.

And so, in Wolfram|Alpha we use the computational knowledge we have to automatically generate a whole report about the question you asked:

Query reportQuery report

We’ve worked hard on both the structure and content of the information presentation. There’d never been anything quite like it before, so everything had to be invented. At the top, there are sometimes “Assumings” (“Which Springfield did you mean?”, etc.)—though the vast majority of the time, our first choice is correct. We found it worked very well to organize the main output into a series of “pods”, often with graphical or tabular contents. Many of the pods have buttons that allow for drilldown, or alternatives.

Everything is generated programmatically. And which pods are there, with what content, and in what sequence, is the result of lots of algorithms and heuristics—including many that I personally devised. (Along the way, we basically had to invent a whole area of “computational aesthetics”: automatically determining what humans will find aesthetic and easy to interpret.)

In most large software projects, one’s building things to precise specifications. But one of the complexities of Wolfram|Alpha is that so much of what it does is heuristic. There’s no “right answer” to exactly what to plot in a particular pod, over what range. It’s a judgement call. And the overall quality of Wolfram|Alpha directly depends on doing a good job at making a vast number of such judgement calls.

But who should make these judgement calls? It’s not something pure programmers are used to doing. It takes real computational thinking skills, and it also usually takes serious knowledge of each content area. Sometimes similar judgement calls get repeated, and one can just say “do it like that other case”. But given how broad Wolfram|Alpha is, it’s perhaps not surprising that there are an incredible number of different things that come up.

And as we approached the launch of Wolfram|Alpha I found myself making literally hundreds of judgement calls every day. “How many different outputs should we generate here?” “Should we add a footnote here?” “What kind of graphic should we produce in that case?”

In my long-running work on designing Wolfram Language, the goal is to make everything precise and perfect. But for Wolfram|Alpha, the goal is instead just to have it behave as people want—regardless of whether that’s logically perfect. And at first, I worried that with all the somewhat arbitrary judgement calls we were making to achieve that, we’d end up with a system that felt very incoherent and unpredictable. But gradually I came to understand a sort of logic of heuristics, and we developed a good rhythm for inventing heuristics that fit together. And in the end—with a giant network of heuristic algorithms—I think we’ve been very successful at creating a system that broadly just automatically does what people want and expect.

Getting the Project Done

Looking back now, more than a decade after the original development of Wolfram|Alpha, it begins to seem even more surprising—and fortuitous—that the project ended up being possible at all. For it is clear now that it critically relied on a whole collection of technical, conceptual and organizational capabilities that we (and I) happened to have developed by just that time. And had even one of them been missing, it would probably have made the whole project impossible.

But even given the necessary capabilities, there was the matter of actually doing the project. And it certainly took a lot of leadership and tenacity from me—as well as all sorts of specific problem solving—to pursue a project that most people (including many of those working on it) thought, at least at first, was impossible.

How did the project actually get started? Well, basically I just decided one day to do it. And, fortunately, my situation was such that I didn’t really have to ask anyone else about it—and as a launchpad I already had a successful, private company without outside investors that had been running well for more than a decade.

From a standard commercial point of view, most people would have seen the Wolfram|Alpha project as a crazy thing to pursue. It wasn’t even clear it was possible, and it was certainly going to be very difficult and very long term. But I had worked hard to put myself in a position where I could do projects just because I thought they were intellectually valuable and important—and this was one I had wanted to do for decades.

One awkward feature of Wolfram|Alpha as a project is that it didn’t work, until it did. When I tried to give early demos, too little worked, and it was hard to see the point of the whole thing. And this led to lots of skepticism, even by my own management team. So I decided it was best to do the project quietly, without saying much about it. And though it wasn’t my intention, things ramped up to the point where a couple hundred people were working completely under the radar (in our very geographically distributed organization) on the project.

But finally, Wolfram|Alpha really started to work. I gave a demo to my formerly skeptical management team, and by the end of an hour there was uniform enthusiasm, and lots of ideas and suggestions.

And so it was that in the spring of 2009, we prepared to launch Wolfram|Alpha.

The Launch

On March 4, 2009, the domain lit up, with a simple:

Wolfram|Alpha domain page

On March 5, I posted a short (and, in the light of the past decade, satisfyingly prophetic) blog that began:

Wolfram|Alpha Is Coming!

We were adding features and fixing bugs at a furious pace. And rack by rack we were building infrastructure to actually support the system (yes, below all those layers of computational intelligence there are ultimately computers with power cables and network connectors and everything else):

At the beginning, we had about 10,000 cores set up to run Wolfram|Alpha (back then, virtualization wasn’t an option for the kind of performance we wanted). But we had no real idea if this would be enough—or what strange things missed by our simulations might happen when real people started using the system.

We could just have planned to put up a message on the site if something went wrong. But I thought it would be more interesting—and helpful—to actually show people what was going on behind the scenes. And so we decided to do something very unusual—and livestream to the internet the process of launching Wolfram|Alpha.

We planned our initial go-live to occur on the evening of Friday, May 15, 2009 (figuring that traffic would be lower on a Friday evening). And we built our version of a “Mission Control” to coordinate everything:

Mission Control

There were plenty of last-minute issues, many of them captured on the livestream. But in classic Mission Control style, each of our teams finally confirmed that we were “go for launch”—and at 9:33:50 pm CT, I pressed the big “Activate” button, and soon all network connections were open, and Wolfram|Alpha was live to the world.

Pressing the "Activate" button

Queries immediately started flowing in from around the world—and within a couple of hours it was clear that the concept of Wolfram|Alpha was a success—and that people found it very useful. It wasn’t long before bugs and suggestions started coming in too. And for a decade we’ve been being told we should give answers about the strangest things (“How many teeth does a snail have?” “How many spiders does the average American eat?” “Which superheroes can hold Thor’s hammer?” “What is the volume of a dog’s eyeball?”).

After our initial go-live on Friday evening, we spent the weekend watching how Wolfram|Alpha was performing (and fixing some hair-raising issues, for example about the routing of traffic to our different colos). And then, on Monday May 18, 2009, we declared Wolfram|Alpha officially launched.

The Growth of Wolfram|Alpha

So what’s happened over the past decade? Every second, there’s been new data flowing into Wolfram|Alpha. Weather. Stock prices. Aircraft positions. Earthquakes. Lots and lots more. Some things update only every month or every year (think: government statistics). Other things update when something happens (think: deaths, elections, etc.) Every week, there are administrative divisions that change in some country around the world. And, yes, occasionally there’s even a new official country (actually, only South Sudan in the past decade).

Wolfram|Alpha has got both broader and deeper in the past decade. There are new knowledge domains. About cat breeds, shipwrecks, cars, battles, radio stations, mines, congressional districts, anatomical structures, function spaces, glaciers, board games, mythological entities, yoga poses and many, many more. Of course, the most obvious domains, like countries, cities, movies, chemicals, words, foods, people, materials, airlines and mountains were already present when Wolfram|Alpha first launched. But over the past decade, we’ve dramatically extended the coverage of these.

What a decade ago was a small or fragmentary area of data, we’ve now systematically filled out—often with great effort. 140,000+ new kinds of food. 350,000 new notable people. 170+ new properties about 58,000 public companies. 100+ new properties about species (tail lengths, eye counts, etc.). 1.6 billion new data points from the US Census. Sometimes we’ve found existing data providers to work with, but quite often we’ve had to painstakingly curate the data ourselves.

It’s amazing how much in the world can be made computable if one puts in the effort. Like military conflicts, for example, which required both lots of historical work, and lots of judgement. And with each domain we add, we’ve put more and more effort into ensuring that it connects with other domains (What was the geolocation of the battle? What historical countries were involved? Etc.).

From even before Wolfram|Alpha launched, we had a wish list of domains to add. Some were comparatively easy. Others—like military conflicts or anatomical structures—took many years. Often, we at first thought a domain would be easy, only to discover all sorts of complicated issues (I had no idea how many different categories of model, make, trim, etc. are important for cars, for example).

In earlier years, we did experiments with volunteer and crowd-sourced data collection and curation. And in some specific areas this worked well, like local information from different countries (how do shoe sizes work in country X?), and properties of fictional characters (who were Batman’s parents?). But as we’ve built out more sophisticated tools, with more automation—as well as tuning our processes for making judgement calls—it’s become much more difficult for outside talent to be effective.

For years, we’ve been the world’s most prolific reporter of bugs in data sources. But with so much computable data about so many things, as well as so many models about how things work, we’re now in an absolutely unique position to validate, cross-check data—and use the latest machine learning to discover patterns and detect anomalies.

Of course, data is just one part of the Wolfram|Alpha story. Because Wolfram|Alpha is also full of algorithms—both precise and heuristic—for computing all kinds of things. And over the past decade, we’ve added all sorts of new algorithms, based on recent advances in science. We’ve also been able to steadily polish what we have, covering all those awkward corner cases (“Are angle units really dimensionless or not?”, “What is the country code of a satphone?”, and so on).

One of the big unknowns when we first launched Wolfram|Alpha was how people would interact with it, and what forms of linguistic input they would give. Many billions of queries later, we know a lot about that. We know a thousand ways to ask how much wood a woodchuck can chuck, etc. We know all the bizarre variants people use to specify even simple arithmetic with units. Every day we collect the “fallthroughs”—inputs we didn’t understand. And for a decade now we’ve been steadily extending our knowledgebase and our natural language understanding system to address them.

Ever since we first launched what’s now the Wolfram Language 30+ years ago, we’ve supported things that would now be called machine learning. But over the past decade, we’ve also become leaders in modern neural nets and deep learning. And in some specific situations, we’ve now been able to make good use of this technology in Wolfram|Alpha.

But there’s been no magic bullet, and I don’t expect one. If one wants to get data that’s systematically computable, one can’t forage it from the web, even with the finest modern machine learning. One can use machine learning to make suggestions in the data curation pipeline, but in the end, if you want to get the right answer, you need a human expert who can exercise judgement based on the accumulated knowledge of a field. (And, yes, the same is true of good training sets for many machine learning tasks.)

In the natural language understanding we need to do for Wolfram|Alpha, machine learning can sometimes help, especially in speeding things up. But if one wants to be certain about the symbolic interpretation of natural language, then—a bit like for doing arithmetic—to get good reliability and efficiency there’s basically no choice but to use the systematic algorithmic approach that we’ve been developing for many years.

Something else that’s advanced a lot since Wolfram|Alpha was launched is our ability to handle complex questions that combine many kinds of knowledge and computation. To do this has required several things. It’s needed more systematically computable data, with consistent structure across domains. It’s needed an underlying data infrastructure that can handle more complex queries. And it’s needed the ability to handle more sophisticated linguistics. None of these have been easy—but they’ve all steadily advanced.

By this point, Wolfram|Alpha is one of the more complex pieces of software and data engineering that exists in the world. It helps that it’s basically all written in Wolfram Language. But over time, different parts have outgrown the frameworks we originally built for them. And an important thing we’ve done over the past decade is to take what we’ve learned from all our experience, and use it to systematically build a sequence of more efficient and more general frameworks. (And, yes, it’s never easy refactoring a large software system, but the high-level symbolic character of the Wolfram Language helps a lot.)

There’s always new development going on in the Wolfram|Alpha codebase—and in fact we normally redeploy a new version every two weeks. Wolfram|Alpha is a very complex system to test. Partly that’s because what it does is so diverse. Partly that’s because the world it’s trying to represent is a complex place. And partly it’s because human language usage is so profoundly non-modular. (“3 chains” is probably—at least for now—a length measurement, “2 chains” is probably a misspelling of a rapper, and so on.)

The Long Tail of Knowledge

What should Wolfram|Alpha know about? My goal has always been to have it eventually know about everything. But obviously one’s got to start somewhere. And when we were first building Wolfram|Alpha we started with what we thought were the “most obvious” areas. Of course, once Wolfram|Alpha was launched, the huge stream of actual questions that people ask have defined a giant to-do list, which we’ve steadily been working through, now for a decade.

When Wolfram|Alpha gets used in a new environment, new kinds of questions come up. Sometimes they don’t make sense (like “Where did I put my keys?” asked of Wolfram|Alpha on a phone). But often they do. Like asking Wolfram|Alpha on a device in a kitchen, “Can dogs eat avocado?” (And, yes, we try to give the best answer current science can provide.)

But I have to admit that, particularly before we launched Wolfram|Alpha, I was personally one of our main sources of “we should know about this” input. I collected reference books, seeing what kinds of things they covered. Wherever I went, I looked for informational posters to see what was on them. And whenever I wondered about pretty much anything, I’d try to see how we could compute about it.

How long will it take me to read this document?” “What country does that license plate come from?” “What height percentile are my kids at?” “How big is a typical 50-year-old oak tree?” “How long can I stay in the sun today?” “What planes are overhead now?” And on and on. Thousands upon thousands of different kinds of questions.

Often we’d be contacting world experts on different, obscure topics—always trying to get definitive computational knowledge about everything. Sometimes it’d seem as if we’d gone quite overboard, working out details nobody would ever possibly care about. But then we’d see people using those details, and sometimes we’d hear “Oh, yes, I use it every day; I don’t know anyplace else to get this right”. (I’ve sometimes thought that if Wolfram|Alpha had been out before 2008, and people could have seen our simulations, they wouldn’t have been caught with so many adjustable-rate mortgages.)

And, yes, it’s a little disappointing when one realizes that some fascinating piece of computational knowledge that took considerable effort to get right in Wolfram|Alpha will—with current usage patterns—probably only be used a few times in a century. But I view the Wolfram|Alpha project in no small part as a long-term effort to encapsulate the knowledge of our civilization, regardless of whether any of it happens to be popular right now.

So even if few people make queries about caves or cemeteries or ocean zones right now, or want to know about different types of paper, or custom screw threads, or acoustic absorption in different materials, I’m glad we’ve got all these things in Wolfram|Alpha. Because now it’s computational knowledge, that can be used by anyone, anytime in the future.

Interesting queries

The Business of Wolfram|Alpha

We’ve put—and continue to put—an immense amount of effort into developing and running Wolfram|Alpha. So how do we manage to support doing that? What’s the business model?

The main Wolfram|Alpha website is simply free for everyone. Why? Because we want it to be that way. We want to democratize computational knowledge, and let anyone anywhere use what we’ve built.

Of course, we hope that people who use the Wolfram|Alpha website will want to buy other things we make. But on the website itself there’s simply no “catch”: we’re not monetizing anything. We’re not running external ads; we’re not selling user data; we’re just keeping everything completely private, and always have.

But obviously there are ways in which we are monetizing Wolfram|Alpha—otherwise we wouldn’t be able to do everything we’re doing. At the simplest level, there are subscription-based Pro versions on the website that have extra features of particular interest to students and professionals. There’s a Wolfram|Alpha app that has extra features optimized for mobile devices. There are also about 50 specialized apps (most for both mobile and web) that support more structured access to Wolfram|Alpha, convenient for students taking courses, hobbyists with particular interests, and professionals with standard workflows they repeatedly follow.

Wolfram|Alpha apps

Then there are Wolfram|Alpha APIs—which are widely licensed by companies large and small (there’s a free tier for hobbyists and developers). There are multiple different APIs. Some are optimized for spoken results, some for back-and-forth conversation, some for visual display, and so on. Sometimes the API is used for some very specific purpose (calculus, particular socioeconomic data, tide computations, whatever). But more often it’s just set up to take any natural language query that arrives. (These days, specialized APIs are actually usually better built directly with Wolfram Language, as I’ll discuss a bit later.) Most of the time, the Wolfram|Alpha API runs on our servers, but some of our largest customers have private versions running inside their infrastructure.

Wolfram|Alpha APIs

When people access Wolfram|Alpha from different parts of the world, we automatically use local conventions for things like units, currency and so on. But when we first built Wolfram|Alpha we fundamentally did it for English language only. I always believed, though, that the methods for natural language understanding that we invented would work for other languages too, despite all their differences in structure. And it turns out that they do.

Each language is a lot of work, though. Even the best automated translation helps only a little; to get reliable results one has to actually build up a new algorithmic structure for each language. But that’s only the beginning. There’s also the issue of automatic natural language generation for output. And then there’s localized data relevant for the countries that use a particular language.

But we’re gradually working on building versions of Wolfram|Alpha for other languages. Nearly five years ago we actually built a full Wolfram|Alpha for Chinese—but, sadly, regulatory issues in China have so far prevented us from deploying it there. Recently we released a version for Japanese (right now set up to handle mainly student-oriented queries). And we’ve got versions for five other languages in various stages of completion (though we’ll typically need local partners to deploy them properly).

Wolfram|Alpha localization

Beyond Wolfram|Alpha on the public web, there are also private versions of Wolfram|Alpha. In the simplest case, a private Wolfram|Alpha is just a copy of the public Wolfram|Alpha, but running inside a particular organization’s infrastructure. Data updates flow into the private Wolfram|Alpha from the outside, but no queries for the private Wolfram|Alpha ever need to leave the organization.

Ordinary Wolfram|Alpha deals with public computational knowledge. But the technology of Wolfram|Alpha can also be applied to private data in an organization. And in recent years an important part of the business story of Wolfram|Alpha is what we call Enterprise Wolfram|Alpha: custom versions of Wolfram|Alpha that answer questions using both public computational knowledge, and private knowledge inside an organization.

For years I’ve run into CEOs who look at Wolfram|Alpha and say, “I wish I could do that kind of thing with my corporate data; it’d be so much easier for my company to make decisions…” Well, that’s what Enterprise Wolfram|Alpha is for. And over the past several years we’ve been installing Enterprise Wolfram|Alpha in some of the world’s largest companies in all sorts of industries, from healthcare to financial services, retail, and so on.

For a few years now, there’s been a lot of talk (and advertising) about the potential for “applying AI in the enterprise”. But I think it’s fair to say that with Enterprise Wolfram|Alpha we’ve got a serious, enterprise use of AI up and running right now—delivering very successful results.

The typical pattern is that you ask a question in natural language, and Enterprise Wolfram|Alpha then generates a report about the answer, using a mixture of public and private knowledge. “What were our sales of foo-pluses in Europe between Christmas and New Year?” Enterprise Wolfram|Alpha has public knowledge about what dates we’re talking about, and what Europe is. But then it’s got to figure out the internal linguistics of what foo-pluses are, and then go query an internal sales database about how many were sold. Finally, it’s got to generate a report that gives the answer (perhaps both the number of units and dollar amount), as well as, probably, a breakdown by country (perhaps normalized by GDP), comparisons to previous years, maybe a time series of sales by day, and so on.

Needless to say, there’s plenty of subtlety in getting a useful result. Like what the definition of Europe is. Or the fact that Christmas (and New Year’s) can be on different dates in different cultures (and, of course, Wolfram|Alpha has all the necessary data and algorithms). Oh, and then one has to start worrying about currency conversion rates (which of course Wolfram|Alpha has)—as well as about conventions about conversion dates that some particular company may use.

Like any sophisticated piece of enterprise software, Enterprise Wolfram|Alpha has to be configured for each particular customer, and we have a business unit called Wolfram Solutions that does that. The goal is always to map the knowledge in an organization to a clear symbolic Wolfram Language form, so it becomes computable in the Wolfram|Alpha system. Realistically, for a large organization, it’s a lot of work. But the good news is that it’s possible—because Wolfram Solutions gets to use the whole curation and algorithm pipeline that we’ve developed for Wolfram|Alpha.

Of course, we can use all the algorithmic capabilities of the Wolfram Language too. So if we have to handle textual data we’re ready with the latest NLP tools, or if we want to be able to make predictions we’re ready with the latest statistics and machine learning, and so on.

Businesses started routinely putting their data onto computers more than half a century ago. But now across pretty much every industry, more acutely than ever, the challenge is to actually use that data in meaningful ways. Eventually everyone will take for granted that they can just ask about their data, like on Star Trek. But the point is that with Enterprise Wolfram|Alpha we have the technology to finally make this possible.

It’s a very successful application of Wolfram|Alpha technology, and the business potential for it is amazing. But for us the main limiting factor is that as a business it’s so different from the rest of what we do. Our company is very much focused on R&D—but Enterprise Wolfram|Alpha requires a large-scale customer-facing organization, like a typical enterprise software company. (And, yes, we’re exploring working with partners for this, but setting up such things has proved to be a slow process!)

By the way, people sometimes seem to think that the big opportunity for AI in the enterprise is in dealing with unstructured corporate data (such as free-form text), and finding “needles in haystacks” there. But what we’ve consistently seen is that in typical enterprises most of their data is actually stored in very structured databases. And the challenge, instead, is to answer unstructured queries.

In the past, it’s been basically impossible to do this in anything other than very simple ways. But now we can see why: because you basically need the whole Wolfram|Alpha technology stack to be able to do it. You need natural language understanding, you need computational knowledge, you need automated report generation, and so on. But that’s what Enterprise Wolfram|Alpha has. And so it’s finally able to solve this problem.

But what does it mean? It’s a little bit like when we first introduced Mathematica 30+ years ago. Before then, a typical scientist wouldn’t expect to use a computer themselves for a computation: they’d delegate it to an expert. But one of the great achievements of Mathematica is that it made things easy enough that scientists could actually compute for themselves. And so, similarly, typical executives in companies don’t directly compute answers themselves; instead, they ask their IT department to do it—then hope the results they get back a week later makes sense. But the point is that with Enterprise Wolfram|Alpha, executives can actually get questions answered themselves, immediately. And the consequences of that for making decisions are pretty spectacular.

Wolfram|Alpha Meets Wolfram Language

The Wolfram Language is what made Wolfram|Alpha possible. But over the past decade Wolfram|Alpha has also given back big time to Wolfram Language, delivering both knowledgebase and natural language understanding.

It’s interesting to compare Wolfram|Alpha and Wolfram Language. Wolfram|Alpha is for quick computations, specified in a completely unstructured way using natural language, and generating as output reports intended for human consumption. Wolfram Language, on the other hand, is a precise symbolic language intended for building up arbitrarily complex computations—in a way that can be systematically understood by computers and humans.

One of the central features of the Wolfram Language is that it can deal not only with abstract computational constructs, but also with things in the real world, like cities and chemicals. But how should one specify these real-world things? Documentation listing the appropriate way to specify every city wouldn’t be practical or useful. But what Wolfram|Alpha provided was a way to specify real-world things, using natural language.

Inside Wolfram|Alpha, natural language input is translated to Wolfram Language. And that’s what’s now exposed in the Wolfram Language, and in Wolfram Notebooks. Type + = and a piece of natural language (like “LA”). The output—courtesy of Wolfram|Alpha natural language understanding technology—is a symbolic entity representing Los Angeles. And that symbolic entity is then a precise object that the Wolfram Language can use in computations.

I didn’t particularly anticipate it, but this interplay between the do-it-however-you-want approach of Wolfram|Alpha and the precise symbolic approach of the Wolfram Language is exceptionally powerful. It gets the best of both worlds—and it’s an important element in allowing the Wolfram Language to assume its unique position as a full-scale computational language.

What about the knowledgebase of Wolfram|Alpha, and all the data it contains? Over the past decade we’ve spent immense effort fully integrating more and more of this into the Wolfram Language. It’s always difficult to get data to the point where it’s computable enough to use in Wolfram|Alpha—but it’s even more difficult to make it fully and systematically computable in the way that’s needed for the Wolfram Language.

Imagine you’re dealing with data about oceans. To make it useful for Wolfram|Alpha you have to get it to the point where if someone asks about a specific named ocean, you can systematically retrieve or compute properties of that ocean. But to make it useful for Wolfram Language, you have to get it to the point where someone can do computations about all oceans, with none missing.

A while ago I invented a 10-step hierarchy of data curation. For data to work in Wolfram|Alpha, you have to get it to level 9 in the hierarchy. But to get it to work in Wolfram Language, you have to get it all the way to level 10. And if it takes a few months to get some data to level 9, it can easily take another year to get it to level 10.

So it’s been a big achievement that over the past decade we’ve managed to get the vast majority of the Wolfram|Alpha knowledgebase up to the level where it can be directly used in the Wolfram Language. So all that data is now not only good enough for human consumption, but also good enough that one can systematically build up computations using it.

All the integration with the Wolfram Language means it’s in some sense now possible to “implement Wolfram|Alpha” in a single line of Wolfram Language code. But it also means that it’s easy to make Wolfram Language instant APIs that do more specific Wolfram|Alpha-like things.

There’s an increasing amount of interconnection between Wolfram|Alpha and Wolfram Language. For example, on the Wolfram|Alpha website most output pods have an “Open Code” button, which opens a Wolfram Notebook in the Wolfram Cloud, with Wolfram Language input that corresponds to what was computed in that pod.

In other words, you can use results from Wolfram|Alpha to “seed” a Wolfram Notebook, in which you can then edit or add inputs do a complete, multi-step Wolfram Language computation. (By the way, you can always generate full Wolfram|Alpha output inside a Wolfram Notebook too.)

Wolfram|Alpha Open Code

Where to Now? The Future of Wolfram|Alpha

When Wolfram|Alpha first launched nobody had seen anything like it. A decade later, people have learned to take some aspects of it for granted, and have gotten used to having it available in things like intelligent assistants. But what will the future of Wolfram|Alpha now be?

Over the past decade we’ve progressively strengthened essentially everything about Wolfram|Alpha—to the point where it’s now excellently positioned for steady long-term growth in future decades. But with Wolfram|Alpha as it exists today, we’re now also in a position to start attacking all sorts of major new directions. And—important as what Wolfram|Alpha has achieved in its first decade has been—I suspect that in time it will be dwarfed by what comes next.

A decade ago, nobody had heard of “fake news”. Today, it’s ubiquitous. But I’m proud that Wolfram|Alpha stands as a beacon of accurate knowledge. And it’s not just knowledge that humans can use; it’s knowledge that’s computable, and suitable for computers too.

More and more is being done these days with computational contracts—both on blockchains and elsewhere. And one of the central things such contracts require is a way to know what’s actually happened in the world—or, in other words, a systematic source of computational facts.

But that’s exactly what Wolfram|Alpha uniquely provides. And already the Wolfram|Alpha API has become the de facto standard for computational facts. But one’s going to see a lot more of Wolfram|Alpha here in the future.

It’s going to put increasing pressure on the reliability of the computational knowledge in Wolfram|Alpha. Because it won’t be long before there will routinely be whole chains of computational contracts—that do important things in the world—and that trigger as soon as Wolfram|Alpha has delivered some particular fact on which they depend.

We’ve developed all sorts of procedures to validate facts. Some are automated—and depend on “theorems” that must be true about data, or cross-correlations or statistical regularities that should exist. Others ultimately rely on human judgement. (A macabre example is our obituary feed: we automatically detect news reports about deaths of people in our knowledgebase. These are then passed to our 24/7 site monitors, who confirm, or escalate the judgement call if needed. Somehow I’m on the distribution list for confirmation requests—and over the past decade there’ve been far too many times when this is how I’ve learned that someone I know has died.)

We take our responsibility as the world’s source of computational facts very seriously, and we’re planning more and more ways to add checks and balances—needless to say, defining what we’re doing using computational contracts.

When we first started developing Wolfram|Alpha, nobody was talking about computational contracts (though, to be fair, I had already thought about them as a potential application of my computational ideas). But now it turns out that Wolfram|Alpha is central to what can be done with them. And as a core component in the long history of the development of systematic knowledge, I think it’s inevitable that over time there will be all sorts of important uses of Wolfram|Alpha that we can’t yet foresee.

In the early days of artificial intelligence, much of what people imagined AI would be like is basically what Wolfram|Alpha has now delivered. So what can now be done with this?

We can certainly put “general knowledge AI” everywhere. Not just in phones and cars and televisions and smart speakers, but also in augmented reality and head- and ear-mounted devices and many other places too.

One of the Wolfram|Alpha APIs we provide is a “conversational” one, that can go back and forth clarifying and extending questions. But what about a full Wolfram|Alpha Turing test–like bot? Even after all these years, general-purpose bots have tended to be disappointing. And if one just connects Wolfram|Alpha to them, there tends to be quite a mismatch between general bot responses and “smart facts” from Wolfram|Alpha. (And, yes, in a Turing test competition, the presence of Wolfram|Alpha is a dead giveaway—because it knows much more than any human would.) But with progress in my symbolic discourse language–and probably some modern machine learning—I suspect it’ll be possible to make a more successful general-purpose bot that’s more integrated with Wolfram|Alpha.

But what I think is critical in many future applications of Wolfram|Alpha is to have additional sources of data and input. If one’s making a personal intelligent assistant, for example, then one wants to give it access to as much personal history data (messages, sensor data, video, etc.) as possible. (We already did early experiments on this back in 2011 with Facebook data.)

Then one can use Wolfram|Alpha to ask questions not only about the world in general, but also about one’s own interaction with it, and one’s own history. One can ask those questions explicitly with natural language—or one can imagine, for example, preemptively delivering answers based on video or some other aspect of one’s current environment.

Beyond personal uses, there are also organizational and enterprise ones. And indeed we already have Enterprise Wolfram|Alpha—making use of data inside organizations. So far, we’ve been building Enterprise Wolfram|Alpha systems mainly for some of the world’s largest companies—and every system has been unique and extensively customized. But in time—especially as we deal with smaller organizations that have more commonality within a particular industry—I expect that we’ll be able to make Enterprise Wolfram|Alpha systems that are much more turnkey, effectively by curating the possible structures of businesses and their IT systems.

And, to be clear, the potential here is huge. Because basically every organization in the world is today collecting data. And Enterprise Wolfram|Alpha will provide a realistic way for anyone in an organization to ask questions about their data, and make decisions based on it.

There are so many sources of data for Wolfram|Alpha that one can imagine. It could be photographs from drones or satellites. It could be video feeds. It could be sensor data from industrial equipment or robots. It could be telemetry from inside a game or a virtual world (like from our new UnityLink). It could be the results of a simulation of some system (say in Wolfram SystemModeler). But in all cases, one can expect to use the technology of Wolfram|Alpha to provide answers to free-form questions.

One can think of Wolfram|Alpha as enabling a kind of AI-powered human interface. And one can imagine using it not only to ask questions about existing data, but also as a way to control things, and to get actions taken. We’ve done experiments with Wolfram|Alpha-based interfaces to complex software systems. But one could as well do this with consumer devices, industrial systems, or basically anything that can be controlled through a connection to a computer.

Not everything is best done with pure Wolfram|Alpha—or with something like natural language. Many things are better done with the full computational language that we have in the Wolfram Language. But when we’re using this language, we’re of course still using the Wolfram|Alpha technology stack.

Wolfram|Alpha is already well on its way to being a ubiquitous presence in the computational infrastructure of the world. And between its direct use, and its use in Wolfram Language, I think we can expect that in the future we’ll all end up routinely encountering Wolfram|Alphas all the time.

For many decades our company—and I—have been single-mindedly pursuing the goal of realizing the potential of computation and the computational paradigm. And in doing this, I think we’ve built a very unique organization, with very unique capabilities.

And looking back a decade after the launch of Wolfram|Alpha, I think it’s no surprise that Wolfram|Alpha has such a unique place in the world. It is, in a sense, the kind of thing that our company is uniquely built to create and develop.

I’ve wanted Wolfram|Alpha for nearly 50 years. And it’s tremendously satisfying to have been able to create what I think will be a defining intellectual edifice in the long history of systematic knowledge. It’s been a good first decade for Wolfram|Alpha. And I begin its second decade with great enthusiasm for the future and for everything that can be done with Wolfram|Alpha.

Happy 10th birthday, Wolfram|Alpha.

]]> 0
<![CDATA[What We’ve Built Is a Computational Language (and That’s Very Important!)]]> Thu, 09 May 2019 15:09:25 +0000 Stephen Wolfram WolfieWhat Kind of a Thing Is the Wolfram Language? I’ve sometimes found it a bit of a struggle to explain what the Wolfram Language really is. Yes, it’s a computer language—a programming language. And it does—in a uniquely productive way, I might add—what standard programming languages do. But that’s only a very small part of [...]]]> Wolfie


What Kind of a Thing Is the Wolfram Language?

I’ve sometimes found it a bit of a struggle to explain what the Wolfram Language really is. Yes, it’s a computer language—a programming language. And it does—in a uniquely productive way, I might add—what standard programming languages do. But that’s only a very small part of the story. And what I’ve finally come to realize is that one should actually think of the Wolfram Language as an entirely different—and new—kind of thing: what one can call a computational language.

So what is a computational language? It’s a language for expressing things in a computational way—and for capturing computational ways of thinking about things. It’s not just a language for telling computers what to do. It’s a language that both computers and humans can use to represent computational ways of thinking about things. It’s a language that puts into concrete form a computational view of everything. It’s a language that lets one use the computational paradigm as a framework for formulating and organizing one’s thoughts.

It’s only recently that I’ve begun to properly internalize just how broad the implications of having a computational language really are—even though, ironically, I’ve spent much of my life engaged precisely in the consuming task of building the world’s only large-scale computational language.

This essay is also in:
SoundCloud »

It helps me to think about a historical analog. Five hundred years ago, if people wanted to talk about mathematical ideas and operations, they basically had to use human natural language, essentially writing out everything in terms of words. But the invention of mathematical notation about 400 years ago (starting with +, ×, =, etc.) changed all that—and began to provide a systematic structure and framework for representing mathematical ideas.

The consequences were surprisingly dramatic. Because basically it was this development that made modern forms of mathematical thinking (like algebra and calculus) feasible—and that launched the mathematical way of thinking about the world as we know it, with all the science and technology that’s come from it.

Well, I think it’s a similar story with computational language. But now what’s happening is that we’re getting a systematic way to represent—and talk about—computational ideas, and the computational way of thinking about the world. With standard programming languages, we’ve had a way to talk about the low-level operation of computers. But with computational language, we now have a way to apply the computational paradigm directly to almost anything: we have a language and a notation for doing computational X, for basically any field “X” (from archaeology to zoology, and beyond).

There’ve been some “mathematical X” fields for a while, where typically the point is to formulate things in terms of traditional mathematical constructs (like equations), that can then “mechanically” be solved (at least, say, with Mathematica!). But a great realization of the past few decades has been that the computational paradigm is much broader: much more can be represented computationally than just mathematically.

Sometimes one’s dealing with very simple abstract programs (and, indeed, I’ve spent years exploring the science of the computational universe of such programs). But often one’s interested in operations and entities that relate to our direct experience of the world. But the crucial point here is that—as we’ve learned in building the Wolfram Language—it’s possible to represent such things in a computational way. In other words, it’s possible to have a computational language that can talk about the world—in computational terms.

And that’s what’s needed to really launch all those possible “computational X” fields.

What Is Computational Language Like?

Let’s say we want to talk about planets. In the Wolfram Language, planets are just symbolic entities:


EntityList[EntityClass["Planet", All]]

We can compute things about them (here, the mass of Jupiter divided by the mass of Earth):


Entity["Planet", "Jupiter"]["Mass"]/Entity["Planet", "Earth"]["Mass"]

Let’s make an image collage in which the mass of each planet determines how big it’s shown:


 EntityClass["Planet", All]["Mass"] -> 
  EntityClass["Planet", All]["Image"]]

To talk about the real world in computational terms, you have to be able to compute things about it. Like here, the Wolfram Language is computing the current position (as I write this) of the planet Mars:


Entity["Planet", "Mars"][EntityProperty["Planet", "HelioCoordinates"]]

And here it’s making a 3D plot of a table of its positions for each of the next 18 months from now:


  Entity["Planet", "Mars"][
   Dated[EntityProperty["Planet", "HelioCoordinates"], 
    Now + Quantity[n, "Months"]]], {n, 18}]]

Let’s do another example. Take an image, and find the human faces in it:


FacialFeatures[CloudGet[""], "Image"]

As another example of computation-meets-the-real-world, we can make a histogram (say, in 5-year bins) of the estimated ages of people in the picture:


 FacialFeatures[CloudGet[""], "Age"], {5}]

It’s amazing what ends up being computable. Here are rasterized images of each letter of the Greek alphabet distributed in “visual feature space”:


FeatureSpacePlot[Rasterize /@ Alphabet["Greek"]]

Yes, it is (I think) impressive what the Wolfram Language can do. But what’s more important here is to see how it lets one specify what to do. Because this is where computational language is at work—giving us a way to talk computationally about planets and human faces and visual feature spaces.

Of course, once we’ve formulated something in computational language, we’re in a position (thanks to the whole knowledgebase and algorithmbase of the Wolfram Language) to actually do a computation about it. And, needless to say, this is extremely powerful. But what’s also extremely powerful is that the computational language itself gives us a way to formulate things in computational terms.

Let’s say we want to know how efficient the Roman numeral system was. How do we formulate that question computationally? We might think about knowing the string lengths of Roman numerals, and comparing them to the lengths of modern integers. It’s easy to express that in Wolfram Language. Here’s a Roman numeral:



And here’s its string length:



Now here’s a plot of all Roman numeral lengths up to 200, divided by the corresponding integer lengths—with callouts automatically showing notable values:


  Callout[StringLength[RomanNumeral[n]]/IntegerLength[n], n], {n, 

It’s easy enough to make a histogram for all numbers up to 1000:


  StringLength[RomanNumeral[n]]/IntegerLength[n], {n, 1000}]]

But of course in actual usage, some numbers are more common than others. So how can we capture that? Well, here’s one (rather naive) computational approach. Let’s just analyze the Wikipedia article about arithmetic, and see what integers it mentions. Again, that computational concept is easy to express in the Wolfram Language: finding cases of numbers in the article, then selecting those that are interpreted as integers:


 TextCases[WikipediaData["arithmetic"], "Number" -> "Interpretation"]]

There are some big numbers, with Roman-numeral representations for which the notion of “string length” doesn’t make much sense:



And then there’s 0, for which the Romans didn’t have an explicit representation. But restricting to “Roman-stringable” numbers, we can make our histogram again:


 Map[StringLength[RomanNumeral[#]]/IntegerLength[#] &][
  Select[IntegerQ[#] && 0 < # < 5000 &][
    "Number" -> "Interpretation"]]]]

And what’s crucial here is that—with Wolfram Language—we’re in a position to formulate our thinking in terms of computational concepts, like StringLength and TextCases and Select and Histogram. And we’re able to use the computational language to express our computational thinking—in a way that humans can read, and the computer can compute from.

The Difference from Programming Languages

As a practical matter, the examples of computational language we’ve just seen look pretty different from anything one would normally do with a standard programming language. But what is the fundamental difference between a computational language and a programming language?

First and foremost, it’s that a computational language tries to intrinsically be able to talk about whatever one might think about in a computational way—while a programming language is set up to intrinsically talk only about things one can directly program a computer to do. So for example, a computational language can intrinsically talk about things in the real world—like the planet Mars or New York City or a chocolate chip cookie. A programming language can intrinsically talk only about abstract data structures in a computer.

Inevitably, a computational language has to be vastly bigger and richer than a programming language. Because while a programming language just has to know about the operation of a computer, a computational language tries to know about everything—with as much knowledge and computational intelligence as possible about the world and about computation built into it.

To be fair, the Wolfram Language is the sole example that exists of a full-scale computational language. But one gets a sense of magnitude from it. While the core of a standard programming language typically has perhaps a few tens of primitive functions built in, the Wolfram Language has more than 5600—with many of those individually representing major pieces of computational intelligence. And in its effort to be able to talk about the real world, the Wolfram Language also has millions of entities of all sorts built into it. And, yes, the Wolfram Language has had more than three decades of energetic, continuous development put into it.

Given a programming language, one can of course start programming things. And indeed many standard programming languages have all sorts of libraries of functions that have been created for them. But the objective of these libraries is not really the same as the objective of a true computational language. Yes, they’re providing specific “functions to call”. But they’re not trying to create a way to represent or talk about a broad range of computational ideas. To do that requires a coherent computational language—of the kind I’ve been building in the Wolfram Language all these years.

A programming language is (needless to say) intended as something in which to write programs. And while it’s usually considered desirable for humans to be able—at least at some level—to read the programs, the ultimate point is to provide a way to tell a computer what to do. But computational language can also achieve something else. Because it can serve as an expressive medium for communicating computational ideas to humans as well as to computers.

Even when one’s dealing with abstract algorithms, it’s common with standard programming languages to want to talk in terms of some kind of “pseudocode” that lets one describe the algorithms without becoming enmeshed in the (often fiddly) details of actual implementation. But part of the idea of computational language is always to have a way to express computational ideas directly in the language: to have the high-level expressiveness and readability of pseudocode, while still having everything be precise, complete and immediately executable on a computer.

Looking at the examples above, one thing that’s immediately obvious is that having the computational language be symbolic is critical. In most standard programming languages, x on its own without a value doesn’t mean anything; it has to stand for some structure in the memory of the computer. But in a computational language, one’s got to be able to have things that are purely symbolic, and that represent, for example, entities in the real world—that one can operate on just like any other kind of data.

There’s a whole cascade of wonderful unifications that flow from representing everything as a symbolic expression, crucial in being able to coherently build up a full-scale computational language. And to make a computational language as readily absorbable by humans as possible, there are also all sorts of detailed issues of interface—like having hierarchically structured notebooks, allowing details of computational language to be iconized for display, and so on.

Why Not Just Use Natural Language?

Particularly in this age of machine learning one might wonder why one would need a precisely defined computational language at all. Why not just use natural language for everything?

Wolfram|Alpha provides a good example (indeed, probably the most sophisticated one that exists today) of what can be done purely with natural language. And indeed for the kinds of short questions that Wolfram|Alpha normally handles, it proves that natural language can work quite well.

But what if one wants to build up something more complicated? Just like in the case of doing mathematics without notation, it quickly becomes impractical. And I could see this particularly clearly when I was writing an introductory book on the Wolfram Language—and trying to create exercises for it. The typical form of an exercise is: “Take this thing described in natural language, and implement it in Wolfram Language”. Early in the book, this worked OK. But as soon as things got more complicated, it became quite frustrating. Because I’d immediately know what I wanted to say in Wolfram Language, but it took a lot of effort to express it in natural language for the exercise, and often what I came up with was hard to read and reminiscent of legalese.

One could imagine that with enough back-and-forth, one might be able to explain things to a computer purely in natural language. But to get any kind of clear idea of what the computer has understood, one needs some more structured representation—which is precisely what computational language provides.

And it’s certainly no coincidence that the way Wolfram|Alpha works is first to translate whatever natural language input it’s given to precise Wolfram Language—and only then to compute answers from it.

In a sense, using computational language is what lets us leverage the last few centuries of exact science and systematic knowledge. Earlier in history, one imagined that one could reason about everything just using words and natural language. But three or four centuries ago—particularly with mathematical notation and other mathematical ideas—it became clear that one could go much further if one had a structured, formal way of talking about the world. And computational language now extends that—bringing a much wider range of things into the domain of formal computational thinking, and going still further beyond natural language.

Of course, one argument for trying to use natural language is that “everybody already knows it”. But the whole point is to be able to apply computational thinking—and to do that systematically, one needs a new way of expressing oneself, which is exactly what computational language provides.

How Computational Language Leverages Natural Language

Computational language is something quite different from natural language, but in its construction it still uses natural language and people’s understanding of it. Because in a sense the “words” in the computational language are based on words in natural language. So, for example, in the Wolfram Language, we have functions like StringLength, TextCases and FeatureSpacePlot.

Each of these functions has a precise computational definition. But to help people understand and remember what the functions do, we use (very carefully chosen) natural language words in their names. In a sense, we’re leveraging people’s understanding of natural language to be able to create a higher level of language. (By the way, with our “code captions” mechanism, we’re able to at least annotate everything in lots of natural languages beyond English.)

It’s a slightly different story when it comes to the zillions of real-world entities that a computational language has to deal with. For a function like TextCases, you both have to know what it’s called, and how to use it. But for an entity like New York City, you just have to somehow get hold of it—and then it’s going to work the same as any other entity. And a convenient way to get hold of it is just to ask for it, by whatever (natural language) name you know for it.

For example, in the Wolfram Language you can just use a “free-form input box”. Type nyc and it’ll get interpreted as the official New York City entity:


    DynamicModuleBox[{Typeset`query$$ = "nyc", 
      Typeset`boxes$$ = 
       TemplateBox[{"\"New York City\"", 
         RowBox[{"Entity", "[", 
           RowBox[{"\"City\"", ",", 
               RowBox[{"\"NewYork\"", ",", "\"NewYork\"", ",", 
                 "\"UnitedStates\""}], "}"}]}], "]"}], 
         "\"Entity[\\\"City\\\", {\\\"NewYork\\\", \\\"NewYork\\\", \
\\\"UnitedStates\\\"}]\"", "\"city\""}, "Entity"], 
      Typeset`allassumptions$$ = {{"type" -> "Clash", "word" -> "nyc",
          "template" -> 
          "Assuming \"${word}\" is ${desc1}. Use as ${desc2} instead",
          "count" -> "2", 
         "Values" -> {{"name" -> "City", "desc" -> "a city", 
            "input" -> "**City-"}, {"name" -> "VisualArts", 
            "desc" -> "a photograph", 
            "input" -> "**VisualArts-"}}}}, 
      Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, 
      Typeset`querystate$$ = {"Online" -> True, "Allowed" -> True, 
        "mparse.jsp" -> 0.274794`5.890552239367699, 
        "Messages" -> {}}}, 
      ToBoxes[AlphaIntegration`LinguisticAssistantBoxes["", 4, 
        Automatic, Dynamic[Typeset`query$$], Dynamic[Typeset`boxes$$],
        Dynamic[Typeset`assumptions$$], Dynamic[Typeset`open$$], 
        Dynamic[Typeset`querystate$$]], StandardForm], 
      ImageSizeCache -> {173., {7., 15.}}, 
      TrackedSymbols :> {Typeset`query$$, Typeset`boxes$$, 
        Typeset`allassumptions$$, Typeset`assumptions$$, 
        Typeset`open$$, Typeset`querystate$$}], 
     DynamicModuleValues :> {}, 
     UndoTrackedVariables :> {Typeset`open$$}], 
    BaseStyle -> {"Deploy"}, DeleteWithContents -> True, 
    Editable -> False, SelectWithContents -> True]\)

You can use this entity to do computations:


    DynamicModuleBox[{Typeset`query$$ = "nyc", 
      Typeset`boxes$$ = 
       TemplateBox[{"\"New York City\"", 
         RowBox[{"Entity", "[", 
           RowBox[{"\"City\"", ",", 
               RowBox[{"\"NewYork\"", ",", "\"NewYork\"", ",", 
                 "\"UnitedStates\""}], "}"}]}], "]"}], 
         "\"Entity[\\\"City\\\", {\\\"NewYork\\\", \\\"NewYork\\\", \
\\\"UnitedStates\\\"}]\"", "\"city\""}, "Entity"], 
      Typeset`allassumptions$$ = {{"type" -> "Clash", "word" -> "nyc",
          "template" -> 
          "Assuming \"${word}\" is ${desc1}. Use as ${desc2} instead",
          "count" -> "2", 
         "Values" -> {{"name" -> "City", "desc" -> "a city", 
            "input" -> "**City-"}, {"name" -> "VisualArts", 
            "desc" -> "a photograph", 
            "input" -> "**VisualArts-"}}}}, 
      Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, 
      Typeset`querystate$$ = {"Online" -> True, "Allowed" -> True, 
        "mparse.jsp" -> 0.274794`5.890552239367699, 
        "Messages" -> {}}}, 
      ToBoxes[AlphaIntegration`LinguisticAssistantBoxes["", 4, 
        Automatic, Dynamic[Typeset`query$$], Dynamic[Typeset`boxes$$],
        Dynamic[Typeset`assumptions$$], Dynamic[Typeset`open$$], 
        Dynamic[Typeset`querystate$$]], StandardForm], 
      ImageSizeCache -> {173., {7., 15.}}, 
      TrackedSymbols :> {Typeset`query$$, Typeset`boxes$$, 
        Typeset`allassumptions$$, Typeset`assumptions$$, 
        Typeset`open$$, Typeset`querystate$$}], 
     DynamicModuleValues :> {}, 
     UndoTrackedVariables :> {Typeset`open$$}], 
    BaseStyle -> {"Deploy"}, DeleteWithContents -> True, 
    Editable -> False, SelectWithContents -> True]\)]

Of course, this kind of free-form input can be ambiguous. Type ny and the first interpretation is New York state:


    DynamicModuleBox[{Typeset`query$$ = "ny", 
      Typeset`boxes$$ = 
       TemplateBox[{"\"New York, United States\"", 
         RowBox[{"Entity", "[", 
           RowBox[{"\"AdministrativeDivision\"", ",", 
               RowBox[{"\"NewYork\"", ",", "\"UnitedStates\""}], 
               "}"}]}], "]"}], 
         "\"Entity[\\\"AdministrativeDivision\\\", {\\\"NewYork\\\", \
\\\"UnitedStates\\\"}]\"", "\"administrative division\""}, "Entity"], 
      Typeset`allassumptions$$ = {{"type" -> "Clash", "word" -> "ny", 
         "template" -> 
          "Assuming \"${word}\" is ${desc1}. Use as ${desc2} instead",
          "count" -> "2", 
         "Values" -> {{"name" -> "USState", "desc" -> "a US state", 
            "input" -> "*C.ny-_*USState-"}, {"name" -> "City", 
            "desc" -> "a city", "input" -> "*C.ny-_*City-"}}}}, 
      Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, 
      Typeset`querystate$$ = {"Online" -> True, "Allowed" -> True, 
        "mparse.jsp" -> 0.321865`5.9592187470275455, 
        "Messages" -> {}}}, 
      ToBoxes[AlphaIntegration`LinguisticAssistantBoxes["", 4, 
        Automatic, Dynamic[Typeset`query$$], Dynamic[Typeset`boxes$$],
        Dynamic[Typeset`assumptions$$], Dynamic[Typeset`open$$], 
        Dynamic[Typeset`querystate$$]], StandardForm], 
      ImageSizeCache -> {333., {7., 15.}}, 
      TrackedSymbols :> {Typeset`query$$, Typeset`boxes$$, 
        Typeset`allassumptions$$, Typeset`assumptions$$, 
        Typeset`open$$, Typeset`querystate$$}], 
     DynamicModuleValues :> {}, 
     UndoTrackedVariables :> {Typeset`open$$}], 
    BaseStyle -> {"Deploy"}, DeleteWithContents -> True, 
    Editable -> False, SelectWithContents -> True]\)

Press the little dots and you get to say you want New York City instead:

New York City

    DynamicModuleBox[{Typeset`query$$ = "nyc", 
      Typeset`boxes$$ = 
       TemplateBox[{"\"New York City\"", 
         RowBox[{"Entity", "[", 
           RowBox[{"\"City\"", ",", 
               RowBox[{"\"NewYork\"", ",", "\"NewYork\"", ",", 
                 "\"UnitedStates\""}], "}"}]}], "]"}], 
         "\"Entity[\\\"City\\\", {\\\"NewYork\\\", \\\"NewYork\\\", \
\\\"UnitedStates\\\"}]\"", "\"city\""}, "Entity"], 
      Typeset`allassumptions$$ = {{"type" -> "Clash", "word" -> "nyc",
          "template" -> 
          "Assuming \"${word}\" is ${desc1}. Use as ${desc2} instead",
          "count" -> "2", 
         "Values" -> {{"name" -> "City", "desc" -> "a city", 
            "input" -> "**City-"}, {"name" -> "VisualArts", 
            "desc" -> "a photograph", 
            "input" -> "**VisualArts-"}}}}, 
      Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, 
      Typeset`querystate$$ = {"Online" -> True, "Allowed" -> True, 
        "mparse.jsp" -> 0.274794`5.890552239367699, 
        "Messages" -> {}}}, 
      ToBoxes[AlphaIntegration`LinguisticAssistantBoxes["", 4, 
        Automatic, Dynamic[Typeset`query$$], Dynamic[Typeset`boxes$$],
        Dynamic[Typeset`assumptions$$], Dynamic[Typeset`open$$], 
        Dynamic[Typeset`querystate$$]], StandardForm], 
      ImageSizeCache -> {173., {7., 15.}}, 
      TrackedSymbols :> {Typeset`query$$, Typeset`boxes$$, 
        Typeset`allassumptions$$, Typeset`assumptions$$, 
        Typeset`open$$, Typeset`querystate$$}], 
     DynamicModuleValues :> {}, 
     UndoTrackedVariables :> {Typeset`open$$}], 
    BaseStyle -> {"Deploy"}, DeleteWithContents -> True, 
    Editable -> False, SelectWithContents -> True]\)

For convenience, the inputs here are natural language. But the outputs—sometimes after a bit of disambiguation—are precise computational language, ready to be used wherever one wants.

And in general, it’s very powerful to be able to use natural language to specify small chunks of computational language. To express large-scale computational thinking, one needs the formality and structure of computational language. But “small utterances” can be given in natural language—like in Wolfram|Alpha—then translated to precise computational language:

Population of NYC

    DynamicModuleBox[{Typeset`query$$ = "population of nyc", 
      Typeset`boxes$$ = 
       RowBox[{TemplateBox[{"\"New York City\"", 
           RowBox[{"Entity", "[", 
             RowBox[{"\"City\"", ",", 
                 RowBox[{"\"NewYork\"", ",", "\"NewYork\"", ",", 
                   "\"UnitedStates\""}], "}"}]}], "]"}], 
           "\"Entity[\\\"City\\\", {\\\"NewYork\\\", \\\"NewYork\\\", \
\\\"UnitedStates\\\"}]\"", "\"city\""}, "Entity"], "[", 
         TemplateBox[{"\"city population\"", 
           RowBox[{"EntityProperty", "[", 
             RowBox[{"\"City\"", ",", "\"Population\""}], "]"}], 
           "\"EntityProperty[\\\"City\\\", \\\"Population\\\"]\""}, 
          "EntityProperty"], "]"}], Typeset`allassumptions$$ = {}, 
      Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, 
      Typeset`querystate$$ = {"Online" -> True, "Allowed" -> True, 
        "mparse.jsp" -> 0.701535`6.297594336611864, 
        "Messages" -> {}}}, 
      ToBoxes[AlphaIntegration`LinguisticAssistantBoxes["", 4, 
        Automatic, Dynamic[Typeset`query$$], Dynamic[Typeset`boxes$$],
        Dynamic[Typeset`assumptions$$], Dynamic[Typeset`open$$], 
        Dynamic[Typeset`querystate$$]], StandardForm], 
      ImageSizeCache -> {291., {11., 18.}}, 
      TrackedSymbols :> {Typeset`query$$, Typeset`boxes$$, 
        Typeset`allassumptions$$, Typeset`assumptions$$, 
        Typeset`open$$, Typeset`querystate$$}], 
     DynamicModuleValues :> {}, 
     UndoTrackedVariables :> {Typeset`open$$}], 
    BaseStyle -> {"Deploy"}, DeleteWithContents -> True, 
    Editable -> False, SelectWithContents -> True]\)
5 largest cities in the world by population

  "City", {EntityProperty["City", "Population"] -> 
    TakeLargest[5]}] // EntityList
Plot a cosine curve in purple

Plot[Cos[x], {x, -6.6, 6.6}, PlotStyle -> Purple]

What Computational Language Makes Possible

I think by now there’s little doubt the introduction of the computational paradigm is the single most important intellectual development of the past century. And going forward, I think computational language is going to be crucial in being able to broadly make use of that paradigm—much as many centuries ago, mathematical notation was crucial to launching the widespread use of the mathematical paradigm.

How should one express and communicate the ideas of a “computational X” field? Blobs of low-level programming language code won’t do it. Instead, one needs something that can talk directly about things in the field—whether they are genes, animals, words, battles or whatever. And one also needs something that humans can readily read and understand. And this is precisely what computational language can provide.

Of course, computational language also has the giant bonus that computers can understand it, and that it can be used to specify actual computations to do. In other words, by being able to express something in computational language, you’re not only finding a good way to communicate it to humans, you’re also setting up something that can leverage the power of actual computation to automatically produce things.

And I suspect that in time it will become clear that the existence of computational language as a communication medium is what ultimately succeeded in launching a huge range of computational X fields. Because it’s what will allow the ideas in these fields to be put in a concrete form that people can think in terms of.

How will the computational language be presented? Often, I suspect, it will be part of what I call computational essays. A computational essay mixes natural language text with computational language—and with the outputs of actual computations described by the computational language. It’s a little like how for the past couple of centuries, technical papers have typically relied on mixing text and formulas.

But a computational essay is something much more powerful. For one thing, people can not only read the computational language in a computational essay, but they can also immediately reuse it elsewhere. In addition, when one writes a computational essay, it’s a computer-assisted activity, in which one shares the load with the computer. The human has to write the text and the computational language, but then the computer can automatically generate all kinds of results, infographics, etc. as described by the computational language.

In practice it’s important that computational essays can be presented in Wolfram Notebooks, in the cloud and on the desktop, and that these notebooks can contain all sorts of dynamic and computational elements.

One can expect to use computational essays for a wide range of things—whether papers, reports, exercises or whatever. And I suspect that computational essays, written with computational language, will become the primary form of communication for computational X fields.

I doubt we can yet foresee even a fraction of the places where computational language will be crucial. But one place that’s already clear is in defining computational contracts. In the past, contracts have basically always been written in natural language—or at least in the variant that is legalese. But computational language provides an alternative.

With the Wolfram Language as it is today, we can’t cover everything in every contract. But it’s already clear how we can use computational language to represent many kinds of things in the world that are the subject of contracts. And the point is that with computational language we can write a precise contract that both humans and machines can understand.

In time there’ll be computational contracts everywhere: for commerce, for defining goals, for AI ethics, and so on. And computational language is what will make them all possible.

When literacy in natural language began to become widespread perhaps 500 years ago, it led to sweeping changes in how the world could be organized, and in the development of civilization. In time I think it’s inevitable that there’ll also be widespread literacy in computational language. Certainly that will lead to much broader application of computational thinking (and, for example, the development of many “computational X” fields). And just as our world today is full of written natural language, so in the future we can expect that there will be computational language everywhere—that both defines a way for us humans to think in computational terms, and provides a bridge between human thinking and the computation that machines and AIs can do.

How Come It’s So Unique?

I’ve talked a lot about the general concept of computational language. But in the world today, there’s actually only one example that exists of a full-scale computational language: the Wolfram Language. At first, it might seem strange that one could say this so categorically. With all the technology out there in the world, how could something be that unique?

But it is. And I suppose this becomes a little less surprising when one realizes that we’ve been working on the Wolfram Language for well over thirty years—or more than half of the whole history of modern computing. And indeed, the span of time over which we’ve been able to consistently pursue the development of the Wolfram Language is now longer than for almost any other software system in history.

Did I foresee the emergence of the Wolfram Language as a full computational language? Not entirely. When I first started developing what’s now the Wolfram Language I wanted to make it as general as possible—and as flexible in representing computational ideas and processes.

At first, its most concrete applications were to mathematics, and to various kinds of modeling. But as time went on, I realized that more and more types of things could fit into the computational framework that we’d defined. And gradually this started to include things in the real world. Then, about a decade and a half ago, I realized that, yes, with the whole symbolic language we’d defined, we could just start systematically representing all those things like cities and chemicals in pretty much the same way as we’d represented abstract things before.

I’d always had the goal of putting as much knowledge as possible into the language, and of automating as much as possible. But from the beginning I made sure that the language was based on a small set of principles—and that as it grew it maintained a coherent and unified design.

Needless to say, this wasn’t easy. And indeed it’s been my daily activity now for more than 30 years (with, for example, 300+ hours of it livestreamed over the past year). It’s a difficult process, involving both deep understanding of every area the language covers, as well as a string of complicated judgement calls. But it’s the coherence of design that this achieves that has allowed the language to maintain its unity even as it has grown to encompass all sorts of knowledge about the real world, as well as all those other things that make it a full computational language.

Part of what’s made the Wolfram Language possible is the success of its principles and basic framework. But to actually develop it has also involved the creation of a huge tower of technology and content—and the invention of countless algorithms and meta-algorithms, as well as the acquisition and curation of immense amounts of data.

It’s been a strange mixture of intellectual scholarship and large-scale engineering—that we’ve been fortunate enough to be able to consistently pursue for decades. In many ways, this has been a personal mission of mine. And along the way, people have often asked me how to pigeonhole what we’re building. Is it a calculation system? Is it an encyclopedia-like collection of data? Is it a programming language?

Well, it’s all of those things. But they’re only part of the story. And as the Wolfram Language has developed, it’s become increasingly clear how far away it is from existing categories. And it’s only quite recently that I’ve finally come to understand what it is we’ve managed to build: the world’s only full computational language. Having understood this, it starts to be easier to see just how what we’ve been doing all these years fits into the arc of intellectual history, and what some of its implications might be going forward.

From a practical point of view, it’s great to be able to respond to that obvious basic question: “What is the Wolfram Language?” Because now we have a clear answer: “It’s a computational language!” And, yes, that’s very important!

]]> 5
<![CDATA[A World Run with Code]]> Thu, 02 May 2019 16:15:31 +0000 Stephen Wolfram CodeWorldThis is an edited transcript of a recent talk I gave at a blockchain conference, where I said I’d talk about “What will the world be like when computational intelligence and computational contracts are ubiquitous?” We live in an interesting time today—a time when we’re just beginning to see the implications of what we might [...]]]> CodeWorld

This is an edited transcript of a recent talk I gave at a blockchain conference, where I said I’d talk about “What will the world be like when computational intelligence and computational contracts are ubiquitous?”

We live in an interesting time today—a time when we’re just beginning to see the implications of what we might call “the force of computation”. In the end, it’s something that’s going to affect almost everything. And what’s going to happen is really a deep story about the interplay between the human condition, the achievements of human civilization—and the fundamental nature of this thing we call computation.

Stephen Wolfram on a world run with code

This essay is also in:
SoundCloud »Scientific American »

So what is computation? Well, it’s what happens when you follow rules, or what we call programs. Now of course there are plenty of programs that we humans have written to do particular things. But what about programs in general—programs in the abstract? Well, there’s an infinite universe of possible programs out there. And many years ago I turned my analog of a telescope towards that computational universe. And this is what I saw:

Cellular automata

  Table[ArrayPlot[CellularAutomaton[n, {{1}, 0}, {30, All}], 
    ImageSize -> 40], {n, 0, 255}], 16]]

Each box represents a different simple program. And often they just do something simple. But look more carefully. There’s a big surprise. This is the first example I saw—rule 30:

Rule 30

ArrayPlot[CellularAutomaton[30, {{1}, 0}, {300, All}], 
 PixelConstrained -> 1]

You start from one cell, and you just follow that simple program—but here’s what you get: all that complexity. At first it’s hard to believe that you can get so much from so little. But seeing this changed my whole worldview, and made me realize just how powerful the force of computation is.

Because that’s what’s making all that complexity. And that’s what lets nature—seemingly so effortlessly—make the complexity it does. It’s also what allows something like mathematics to have the richness it does. And it provides the raw material for everything it’s possible for us humans to do.

Now the fact is that we’re only just starting to tap the full force of computation. And actually, most of the things we do today—as well as the technology we build—are specifically set up to avoid it. Because we think we have to make sure that everything stays simple enough that we can always foresee what’s going to happen.

But to take advantage of all that power out there in the computational universe, we’ve got to go beyond that. So here’s the issue: there are things we humans want to do—and then there’s all that capability out there in the computational universe. So how do we bring them together?

Well, actually, I’ve spent a good part of my life trying to solve that—and I think the key is what I call computational language. And, yes, there’s only basically one full computational language that exists in the world today—and it’s the one I’ve spent the past three decades building—the Wolfram Language.

Traditional computer languages—“programming languages”—are designed to tell computers what to do, in essentially the native terms that computers use. But the idea of a computational language is instead to take the kind of things we humans think about, and then have a way to express them computationally. We need a computational language to be able to talk not just about data types and data structures in a computer, but also about real things that exist in our world, as well as the intellectual frameworks we use to discuss them.

And with a computational language, we have not only a way to help us formulate our computational thinking, but also a way to communicate to a computer on our terms.

I think the arrival of computational language is something really important. There’s some analog of it in the arrival of mathematical notation 400 or so years ago—that’s what allowed math to take off, and in many ways launched our modern technical world. There’s also some analog in the whole idea of written language—which launched so many things about the way our world is set up.

But, you know, if we look at history, probably the single strongest systematic trend is the advance of technology. That over time there’s more and more that we’ve been able to automate. And with computation that’s dramatically accelerating. And in the end, in some sense, we’ll be able to automate almost everything. But there’s still something that can’t be automated: the question of what we want to do.

It’s the pattern of technology today, and it’s going to increasingly be the pattern of technology in the future: we humans define what we want to do—we set up goals—and then technology, as efficiently as possible, tries to do what we want. Of course, a critical part of this is explaining what we want. And that’s where computational language is crucial: because it’s what allows us to translate our thinking to something that can be executed automatically by computation. In effect, it’s a bridge between our patterns of thinking, and the force of computation.

Let me say something practical about computational language for a moment. Back at the dawn of the computer industry, we were just dealing with raw computers programmed in machine code. But soon there started to be low-level programming languages, then we started to be able to take it for granted that our computers would have operating systems, then user interfaces, and so on.

Well, one of my goals is to make computational intelligence also something that’s ubiquitous. So that when you walk up to your computer you can take for granted that it will have the knowledge—the intelligence—of our civilization built into it. That it will immediately know facts about the world, and be able to use the achievements of science and other areas of human knowledge to work things out.

Obviously with Wolfram Language and Wolfram|Alpha and so on we’ve built a lot of this. And you can even often use human natural language to do things like ask questions. But if you really want to build up anything at all sophisticated, you need a more systematic way to express yourself, and that’s where computational language—and the Wolfram Language—is critical.

OK, well, here’s an important use case: computational contracts. In today’s world, we’re typically writing contracts in natural language, or actually in something a little more precise: legalese. But what if we could write our contracts in computational language? Then they could always be as precise as we want them to be. But there’s something else: they can be executed automatically, and autonomously. Oh, as well as being verifiable, and simulatable, and so on.

Computational contracts are something more general than typical blockchain smart contracts. Because by their nature they can talk about the real world. They don’t just involve the motion of cryptocurrency; they involve data and sensors and actuators. They involve turning questions of human judgement into machine learning classifiers. And in the end, I think they’ll basically be what run our world.

Right now, most of what the computers in the world do is to execute tasks we basically initiate. But increasingly our world is going to involve computers autonomously interacting with each other, according to computational contracts. Once something happens in the world—some computational fact is established—we’ll quickly see cascades of computational contracts executing. And there’ll be all sorts of complicated intrinsic randomness in the interactions of different computational acts.

In a sense, what we’ll have is a whole AI civilization. With its own activities, and history, and memories. And the computational contracts are in effect the laws of the AI civilization. We’ll probably want to have a kind of AI constitution, that defines how generally we want the AIs to act.

Not everyone or every country will want the same one. But we’ll often want to say things like “be nice to humans”. But how do we say that? Well, we’ll have to use a computational language. Will we end up with some tiny statement—some golden rule—that will just achieve everything we want? The complexity of human systems of laws doesn’t make that seem likely. And actually, with what we know about computation, we can see that it’s theoretically impossible.

Because, basically, it’s inevitable that there will be unintended consequences—corner cases, or bugs, or whatever. And there’ll be an infinite hierarchy of patches one needs to apply—a bit like what we see in human laws.

You know, I keep on talking about computers and AIs doing computation. But actually, computation is a more general thing. It’s what you get by following any set of rules. They could be rules for a computer program. But they could also be rules, say, for some technological system, or some system in nature.

Think about all those programs out in the computational universe. In detail, they’re all doing different things. But how do they compare? Is there some whole hierarchy of who’s more powerful than whom? Well, it turns out that the computational universe is a very egalitarian place—because of something I discovered called the Principle of Computational Equivalence.

Because what this principle says is that all programs whose behavior is not obviously simple are actually equivalent in the sophistication of the computations they do. It doesn’t matter if your rules are very simple or very complicated: there’s no difference in the sophistication of the computations that get done.

It’s been more than 80 years since the idea of universal computation was established: that it’s possible to have a fixed machine that can be programmed to do any possible computation. And obviously that’s been an important idea—because it’s what launched the software industry, and much of current technology.

But the Principle of Computational Equivalence says something more: it says that not only is something like universal computation possible, it’s ubiquitous. Out in the computational universe of possible programs many achieve it, even very simple ones, like rule 30. And, yes, in practice that means we can expect to make computers out of much simpler—say molecular—components than we might ever have imagined. And it means that all sorts of even rather simple software systems can be universal—and can’t be guaranteed secure.

But there’s a more fundamental consequence: the phenomenon of computational irreducibility. Being able to predict stuff is a big thing, for example in traditional science-oriented thinking. But if you’re going to predict what a computational system—say rule 30—is going to do, what it means is that somehow you have to be smarter than it is. But the Principle of Computational Equivalence says that’s not possible. Whether it’s a computer or a brain or anything else, it’s doing computations that have exactly the same sophistication.

So it can’t outrun the actual system itself. The behavior of the system is computationally irreducible: there’s no way to find out what it will do except in effect by explicitly running or watching it. You know, I came up with the idea of computational irreducibility in the early 1980s, and I’ve thought a lot about its applications in science, in understanding phenomena like free will, and so on. But I never would have guessed that it would find an application in proof-of-work for blockchains, and that measurable fractions of the world’s computers would be spending their time purposefully grinding computational irreducibility.

By the way, it’s computational irreducibility that means you’ll always have unintended consequences, and you won’t be able to have things like a simple and complete AI constitution. But it’s also computational irreducibility that in a sense means that history is significant: that there’s something irreducible achieved by the course of history.

You know, so far in history we’ve only really had one example of what we’re comfortable calling “intelligence”—and that’s human intelligence. But something the Principle of Computational Equivalence implies is that actually there are lots of things that are computationally just as sophisticated. There’s AI that we purposefully build. But then there are also things like the weather. Yes, we might say in some animistic way “the weather has a mind of its own”. But what the Principle of Computational Equivalence implies is that in some real sense it does: that the hydrodynamic processes in the atmosphere are just as sophisticated as anything going on in our brains.

And when we look out into the cosmos, there are endless examples of sophisticated computation—that we really can’t distinguish from “extraterrestrial intelligence”. The only difference is that—like with the weather—it’s just computation going on. There’s no alignment with human purposes. Of course, that’s a slippery business. Is that graffiti on the blockchain put there on purpose? Or is it just the result of some computational process?

That’s why computational language is important: it provides a bridge between raw computation and human thinking. If we look inside a typical modern neural net, it’s very hard to understand what it does. Same with the intermediate steps of an automated proof of a theorem. The issue is that there’s no “human story” that can be told about what’s going on there. It’s computation, alright. But—a bit like the weather—it’s not computation that’s connected to human experience.

It’s a bit of a complicated thing, though. Because when things get familiar, they do end up seeming human. We invent words for common phenomena in the weather, and then we can effectively use them to tell stories about what’s going on. I’ve spent much of my life as a computational language designer. And in a sense the essence of language design is to identify what common lumps of computational work there are, that one can make into primitives in the language.

And it’s sort of a circular thing. Once one’s developed a particular primitive—a particular abstraction—one then finds that one can start thinking in terms of it. And then the things one builds end up being based on it. It’s the same with human natural language. There was a time when the word “table” wasn’t there. So people had to start describing things with flat surfaces, and legs, and so on. But eventually this abstraction of a “table” appeared. And once it did, it started to get incorporated into the environment people built for themselves.

It’s a common story. In mathematics there are an infinite number of possible theorems. But the ones people study are ones that are reached by creating some general abstraction and then progressively building on it. When it comes to computation, there’s a lot that happens in the computational universe—just like there’s a lot that happens in the physical universe—that we don’t have a way to connect to.

It’s like the AIs are going off and leading their own existence, and we don’t know what’s going on. But that’s the importance of computational language, and computational contracts. They’re what let us connect the AIs with what we humans understand and care about.

Let’s talk a little about the more distant future. Given the Principle of Computational Equivalence I have to believe that our minds—our consciousness—can perfectly well be represented in purely digital form. So, OK, at some point the future of our civilization might be basically a trillion souls in a box. There’ll be a complicated mixing of the alien intelligence of AI with the future of human intelligence.

But here’s the terrible thing: looked at from the outside, those trillion souls that are our future will just be doing computations—and from the Principle of Computational Equivalence, those computations won’t be any more sophisticated than the computations that happen, say, with all these electrons running around inside a rock. The difference, though, is that the computations in the box are in a sense our computations; they’re computations that are connected to our characteristics and our purposes.

At some level, it seems like a bad outcome if the future of our civilization is a trillion disembodied souls basically playing videogames for the rest of eternity. But human purposes evolve. I mean, if you tried to explain to someone from a thousand years ago why today we might walk on a treadmill, we’d find it pretty difficult. And I think the good news is that at any time in history, what’s happening then can seem completely meaningful at that time.

The Principle of Computational Equivalence tells us that in a sense computation is ubiquitous. Right now the computation we define exists mostly in the computers we’ve built. But in time, I expect we won’t just have computers: everything will basically be made of computers. A bit like a generalization of how it works with biological life, every object and every material will be made of components that do computations we’ve somehow defined.

But the pressure again is on how we do that definition. Physics gives some basic rules. But we get to say more than that. And it’s computational language that makes what we say be meaningful to us humans.

In the much nearer term, there’s a very important transition: the point at which literacy in computational language becomes truly commonplace. It’s been great with the Wolfram Language that we can now give kids a way to actually do computational thinking for real. It’s great that we can now have computational essays where people get to express themselves in a mixture of natural language and computational language.

But what will be possible with this? In a sense, human language was what launched civilization. What will computational language do? We can rethink almost everything: democracy that works by having everyone write a computational essay about what they want, that’s then fed to a big central AI—which inevitably has all the standard problems of political philosophy. New ways to think about what it means to do science, or to know things. Ways to organize and understand the civilization of the AIs.

A big part of this is going to start with computational contracts and the idea of autonomous computation—a kind of strange merger of the world of natural law, human law, and computational law. Something anticipated three centuries ago by people like Leibniz—but finally becoming real today. Finally a world run with code.

]]> 0