"I broke it" vs. "it's broken"

“I don’t know what I did.“ (Cue ominous music.)

These are usually the first words that non tech-nerds will utter when asking for help with a computer problem. The printer was working, now it’s not. A certain toolbar has always been there. Now it’s not. Double-clicking on an icon used to load a certain application, now it doesn’t. A network drive used to be available, now it isn’t. And on, and on it goes.

Let’s step back for a moment – quoting from my post a couple of weeks ago, cargo-cult troubleshooting:

There's an interesting aside to this in terms of why we assume that the problem is on our end first, rather than the other. It's what I call the "I broke it vs. It's broken" mindset, of which I'll say more in another post, but that in essence says that with computer systems we tend to look at ourselves, and what is under out control, as the source of the problem, rather than something else. This is changing slowly in some areas, but in a lot of cases, with software in particular, we don't blame the software (or in this case, the internet service). We blame ourselves. As opposed to nearly everything else, where we don't blame ourselves. We say "the car broke down," not "I broke the car." We say "The fridge isn't working properly" as opposed to "I wonder what I did to the fridge that it's no longer working". And so on. We tend to think of Google, Apple, and pretty much anyone else as black boxes that function all the time, generally ignoring that these are enormously complex systems run by non-superhuman beings on non-perfect hardware and software. Mistakes are made. Software has bugs. Operational processes get screwed up. That's how things are, they do the best they can, but nothing's perfect.

This is perhaps my biggest pet peeve with computing devices. Whenever someone tells me “I don’t know what I did,” I give them the fridge example and they say, “good point… but… this was working before.” They see the point, but they still blame themselves. And it drives me crazy. (Note: As I mention in the paragraph, some of this comes from a sense of infallibility we assign to web services, but that’s a slightly different topic and so I’ll leave that for yet another post. What I want to discuss here has to do with personal computing devices themselves).

This didn’t happen by chance. Software as long placed seemingly impossible choices on users. I still chuckle at DOS’s “Abort, Retry, Fail?” option when, say, a DIR operation failed. Of course, there’s a meaning to each of those options (nevermind that in practice it didn’t make much of a difference which one you chose, since they usually happened when there was a hardware failure).

Now, this is fairly common with new technologies – early on there’s many more low level details that are exposed to the user that allow them to create problems. The difference with software is its malleability and the fact that we chose, early on, to expose this malleability to every day users, and many of the initial metaphors were low-level enough that they could easily be misused, like, say, the filesystem (a bit more on that below).

Granted, software does present more opportunities for a user to make a mistake and “break things” than your average fridge, but in my mind that’s not an excuse. Software should be flexible, yes, but it should also be resilient to user choices, allowing easy recovery and an understanding, on the part of the device, of state.

Frequently the source of the error is a hardware problem. These days, automatic updates can also break software or misconfigure settings. This isn’t the user’s fault. Many other times, it was, in fact, something the user did that “broke it.” But my argument is that even in that case it’s our responsibility as software designers to build software that is resilient, if you will, to user choices. Back to the fridge for a moment: you can break a fridge by doing things that, for example, push the motor too hard, or if you’re smashing the controls inside, but it requires time, it’s not easy, and it can’t happen in a split second while you are distracted.

The filesystem is a great example of this problem. It’s too easy to make mistakes. While using it, you have to pay attention not just to the task at hand but also have to devote a significant amount of attention to the mechanics of doing it to make sure you don’t, say, wipe out some unrelated documents while trying to create a new one. That’s why I really like the idea of what Google has done with Google Docs and what Apple is trying to do with iCloud in general, pushing the filesystem out of the way to leave in place just a document metaphor, closer to what’s in place in iOS (for an in-depth discussion of this topic, you should read the iCloud section in John Siracusa’s excellent OS X 10.8 Ars Technica review. And if you haven’t yet, read the whole thing while you’re at it.) These new approaches aren’t perfect by any means. Behavior and functionality are still wonky at times and it’s hard to let go of an interaction metaphor that we have been using for decades, but we have to start somewhere.

There are many more areas in which this kind of shift has to happen, but since in the end it all comes down to data, it’s really, at the core, a difference in how we approach data creation, modification, and deletion. creation should be painless, modification should almost always automatically maintain version information, switching between versions/states should be easy, deleting information should be very difficult, and, in a few select cases, pretty much impossible (if you think this is extreme, consider, do you have the option to delete your computer’s firmware? Not really, and for good reason, but the firmware isn’t the only critical component in a computer). This can apply to UI state, system settings, device setup, display preferences, you name it. Incidentally, all large-scale web services have to implement these notions one way or another. Bringing down your entire web service because of one bad build just won’t do. :)

We know how we got here. For a while, we had neither the tools not the computing power and storage to implement these ideas, but that is no longer the case. It’s taken a long time to get these concepts into our heads; it will take a long time to get them out, but I think we’re making progress and we’re on the right path. Here’s hoping.