islanded in the datastream: the limits of a paradigm

> > _Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom. _ > >

— Clifford Stoll

“I just know I’ve seen it before.”

islanded datastream limits of a paradigmA simple statement about a straightforward problem, and perhaps the best embodiment of the irony that while computers can, in theory, remember everything, and would allow us to access any information that exists within our reach — if we could only define precisely enough what we’re looking for.

It’s something that happens to everyone, in every context: at work, at home, at school. We could be trying to remember a reference to use in a document, or looking for a videoclip to show to a friend, or for a photo that (we think) we took during a trip years ago, an email from two weeks ago that has the information to get to our next meeting, or trying to find an article that we read last month and that we’d like to send to someone.

Google can search for something we’ve never seen out of billions of documents that match a keyword, but we can’t easily find a page we read last week. We create documents and then lose track of them. Our hard drives contain a sea of duplicates, particularly photos, images, and videos. Our email, messaging apps, digital calendars, media libraries, browser histories, file systems, all contain implicit and explicit information about our habits, the people we’ve met, places we’ve been, things we’ve done, but it doesn’t seem to help. Often, our digital world seems to be at the whim of an incompetent version of Maxwell’s Demon, actively turning much of our data into a chaotic soup of ever-increasing entropy that makes it increasingly harder to extract the right information at the right time.

We now have countless ways of instantly sharing and communicating. We consume, experience, create and publish in shorter bursts, mainlining tweets, photos, videos, status messages, whole activity feeds.

In the last ten years the rise relatively inexpensive, fast, pervasive Internet access, wireless infrastructure, and flexible new approaches to server-side computing have had an enormous effect in many areas, and has contributed to the creation of what are arguably entirely new realities in a shift of a breadth and depth with few, if any, historical precedents.

But not everything is evolving at the same speed. In today’s seemingly limitless ocean of ideas of innovation, a significant portion of the software we use everyday remains stuck in time, in some cases with core function and behavior that has remained literally unchanged for decades.

The software and services that we use to access and interact with our information universe have become optimized around _telling _us what’s happening now, while those that help us understand and leverage the present and the past, in the form of what we perceive and know, and the future, in the form of learning and making new connections, have become stale and strained under ever-growing data volumes.

Intranets hold data that we need and can’t find, while the Internet keeps pushing us to look at content we may not need but can’t seem to escape. Consumption of media has become nearly frictionless for some types of activities, like entertainment or brief social interactions, while becoming increasingly more difficult for others, like work or study activities that involve referencing and recall. Search engines have opened up access to information of all kinds and reduced the barriers to dissemination for new content as soon as it becomes available, while our ability to navigate through the data closest to us, in some cases data we ourselves produce, has in effect decreased.

Meanwhile, the proliferation of computing devices of all types and sizes, of sensors, of services that provide ambient information, provide a solid layer of new capabilities and raw data that we can rely on, but that are rarely used extensively or beyond the obvious.

The limits of a paradigm

This isn’t about just interfaces, or publishing standards, or protocols. It can’t be fixed by a better bookmarking service or a better search engine, or nicer icons on your filesystem browser app. The interfaces, the protocols they use, the storage and processing models in clients and middleware are all tied, interdependent, or rather codependent.

What is happening is that we have reached the limits of paper-centric paradigms, a model of the digital world built upon organizational constructs and abstractions based the physical world and thus made up of hierarchies of data that can be understood as a sequences of pages of printed paper. Hierarchies in which people, us, whom technology should be helping, are at the very bottom of the stack, and what we can see and do is constrained by what’s higher up in whatever hierarchy is at play at the moment.

These hierarchies are everywhere: filesystems, DNS, websites, email folders, music players, URL structures, contact lists. They are embedded in fundamental interface metaphors, in how we structure content — even present in the unseen depths of the Internet’s routing systems.

“Paper centric interfaces” are still pervasive. A significant percentage of modern software could be printed and not only retain most of its information, but also a lot of its “functionality.” For example, what most people do with their iPad calendar could be done as well using a piece of paper with a calendar drawn on it. And most of these things have not only remained unchanged for years, they are in many cases just plain electronic translations of paper-based systems (Cc, BCc, From, To, all existed as part of corporate memo flows well before email was invented).

The hierarchies embedded in our information systems explain why China has the ability to control (but not in detail) what its Citizens can and can’t see on the Internet.

Within these hierarchies, some services have carved out new spaces that show what’s possible when the digital starts to be unleashed from the physical. Wikipedia, Facebook and Twitter are perhaps some of the best examples. Not surprisingly, they’re all fully or partially banned in China, while their “local” copies/competitors can be controlled. But they are universes unto themselves.

There are many other consequences to our over-reliance on hierarchical, centralized architectures. Sites and services go down, and millions of people are affected. Currently, we are on a trajectory in which people increasingly have less, rather than more control over what they see. Privacy is easier to invade, whether by government actors in legitimate—although often overly broad—security sweeps, or by criminals. A single corporate network breach can expose millions of people to fraud or theft.

Large-scale services have to resort to incredibly complex caching systems and massive infrastructure to support their services: the largest Internet cloud services have more than one server for every 1,000 people online.

A lot of this infrastructure seems structurally incapable of change, but it only seems that way. Many companies that run large-scale web services already operate internal systems that no longer resemble anything remotely recognizable as what existed when the protocols they support first came into widespread use. They’re spaceship engines being forced into Ford Model-T frames, twisted into knots to “speak” in a form and at a pace that decades-old conceptual frameworks can process.

Clients, middleware, protocols, what’s visible, and what’s in our pockets and our homes, is stuck in time (examples and discussion on this in future posts). Getting it unstuck will require making changes at multiple levels, but it’s perfectly doable. The digital world is infinitely malleable, and so should be our view of it.