A .data Top-Level Internet Domain?

January 10, 2012

.data

There’s been very little change in top-level internet domains (like .com, .org, .us, etc.) for a long time. But a number of years ago I started thinking about the possibility of having a new .data top-level domain (TLD). And starting this week, there’ll finally be a period when it’s possible to apply to create such a thing.

It’s not at all clear what’s going to happen with new TLDs—or how people will end up feeling about them. Presumably there’ll be TLDs for places and communities and professions and categories of goods and events. A .data TLD would be a slightly different kind of thing. But along with some other interested parties, I’ve been exploring the possibility of creating such a thing.

With Wolfram|Alpha and Mathematica—as well as our annual Data Summit—we’ve been deeply involved with the worldwide data community, and coordinating the creation of a .data TLD would be an extension of that activity.

But what would be the point? For me, it’s about highlighting the exposure of data on the internet—and providing added impetus for organizations to expose data in a way that can efficiently be found and accessed.

In building Wolfram|Alpha, we’ve absorbed an immense amount of data, across a huge number of domains. But—perhaps surprisingly—almost none of it has come in any direct way from the visible internet. Instead, it’s mostly from a complicated patchwork of data files and feeds and database dumps.

But wouldn’t it be nice if there was some standard way to get access to whatever structured data any organization wants to expose?

Right now there are conventions for websites about exposing sitemaps that tell web crawlers how to navigate the sites. And there are plenty of loose conventions about how websites are organized. But there’s really nothing about structured data.

Now of course today’s web is primarily aimed at two audiences: human readers and search engine crawlers. But with Wolfram|Alpha and the idea of computational knowledge, it’s become clear that there’s another important audience: automated systems that can compute things.

There are product catalogs, store information, event calendars, regulatory filings, inventory data, historical reference material, contact information—lots of things that can be very usefully computed from. But even if these things are somewhere on an organization’s website, there’s no standard way to find them, let alone standard structured formats for them.

My concept for the .data domain is to use it to create the “data web”—in a sense a parallel construct to the ordinary web, but oriented toward structured data intended for computational use. The notion is that alongside a website like wolfram.com, there’d be wolfram.data.

If a human went to wolfram.data, there’d be a structured summary of what data the organization behind it wanted to expose. And if a computational system went there, it’d find just what it needs to ingest the data, and begin computing with it.

Needless to say, as we’ve learned over and over again in building Wolfram|Alpha, getting the underlying data is just the beginning of the story. The real work usually starts when one wants to compute from it—so that one can answer specific questions, generate specific reports, and so on.

For example, in our recent work on making the Best Buy product catalog computable, the original data (which came to us as a database dump) was perfectly easy to read. The real work came in the whole rest of the pipeline that was involved in making that data computable.

But the first step is to get the underlying data. And my concept for the .data domain is to provide a uniform mechanism—accessible to any organization, of any size—for exposing the underlying data.

Now of course one could just start a convention that organizations should have a “/datamap.xml” file (or somesuch) in the root of their web domains, just like a sitemap—rather than having a whole separate .data site. But I think introducing a new .data top-level domain would give much more prominence to the creation of the data web—and would provide the kind of momentum that’d be needed to get good, widespread, standards for the various kinds of data.

What is the relation of all this to the semantic web? The central notion of the semantic web is to introduce markup for human-readable web pages that makes them easier for computers to understand and process. And there’s some overlap here with the concept of the data web. But the bulk of the data web is about providing a place for large lumps of structured data that no human would ever directly want to deal with.

A decade ago I suggested to early search engine pioneers that they could get to the deep web by defining standards for how to expose data from databases. For a while there was enthusiasm about exposing “web services”, and now there are all manner of APIs made available by different organizations.

It’s been interesting for me in the past few years to be involved in the emergence of the modern data community. And from what I have seen, I think we’re now just reaching a critical point, where a wide range of organizations are ready to engage in delivering large-scale structured data in standardized forms. So it is a convenient coincidence that this is happening just when it becomes possible to create a .data top-level domain.

We’re certainly not sure what all the issues about a .data TLD will be, and we’re actively seeking input and partners in this effort. But I think there’s a potentially important opportunity, so I’m trying to do what I can to provide leadership, and further help to accelerate the birth of the data web.

Posted in: Other

Name (required)

Email (will not be published; required)

Please enter your name.

Website

55 comments

« Older Comments

If you want to make a difference please invest your time in setting up another root DNS hierarchy independent of the corporate compromised, censorship armed for profit fiasco that has overrun the free web.

Xap

January 16, 2012 at 10:10 pm
.data TLD would be a great idea. It would make search for data much easier, faster, and perhaps most importantly, reliable.

Payman

February 7, 2012 at 4:22 pm
This will be the future of Data, It’ll go public.

Vijayaraj

March 8, 2012 at 9:46 pm
Just do it, then add a layer of something akin to yahood pipes – although wrest it from them and make it sing, they haven’t (yet):(

jTA

April 6, 2012 at 5:34 am
Dear Mr. Wolfram: Well, I do really dig what you’re going for. It’s just that when I merge the capabilities of Wolfram/Alpha and the potential scope of the TLD .data project, I’m more than just a little bit reminded of 1984 and Brave New World. I’m sure you’ve noticed that foreshadow, too. I just hope the boss can’t tell I stay up until 3 am every night playing online, and the USDA doesn’t send the Food Pyramid Police after me! I can just hear it now: A voice that sounds like my grandmother telling me that we need to eat more dark leafy greens. Please don’t misunderstand, though; I’m not being reactionary. Rather, I am sharing my black humor!
On the serious side, though, I can’t wait to see what W/A can do in my life! I think it might be something I can use to help me teach the scientific method to a group of fifth-graders. We have one month to prepare for the Science Olympiad, where they’ll build a suspension bridge from drinking straws, predict how much a foil barge will hold, and do several other events which involve the hypothesize-design-predict-test process. I’m going to assume you’re a busy man, so I won’t expect you to answer this post, but just in case I might as well ask a boon: It would be awesome if you had recommendations to share as to how to introduce the system to students, or specific ideas for applying it to meet their needs. Thank You for all of your special work ~ Liz Uelmen

Liz Uelmen

April 14, 2012 at 8:51 am