diego's weblog

there and back again

Category Archives: technology

here’s when you get a sense that the universe is telling you something

In the same Amazon package you get:

    The latest Thomas Pynchon novel.
    The World War Z blu ray.
    Soup.

Telling you what exactly…. well, that is less clear.

the apple developer center downpocalypse

appledevcenter

We’re now into day three of the Apple Developer Center being down. This is one of those instances in which Apple’s tendency to “let products speak for themselves,” an approach that ordinarily has a lot going for it, can be counterproductive. In three days we’ve gone from “Downtime, I wonder what they’ll upgrade,” to “Still down, I wonder what’s going on?” to “Still down, something bad is definitely going on.”

Which, btw, is the most likely scenario at this point. If you’re ever been involved in 24/7 website operations you can picture what life must have been like since Thursday for dozens, maybe hundreds of people at Apple: no sleep, constant calls, writing updates to be passed along the chain, increasingly urgent requests from management wanting to know, exactly, how whatever got screwed up got screwed up, and that competing with the much more immediately problem of actually solving the issue.

And a few people in particular, likely less than a dozen, are under particular pressure. I’m not talking about management (although they have pressure of their own) but the few sysadmins, devops, architects and engineers that are at the center of whatever team is responsible for solving the problem, which undoubtedly was also in charge of the actual maintenance that led to the outage in the first place, so the pressure is multiplied.

Even for global operations at massive scale, this is what it usually comes down to — a few people. They’re on the front lines, and hopefully they know that some of us appreciate their efforts and that of the teams working non-stop to solve the problem. I know I do.

The significance of the dev center is hard to see for non-developers, but it’s real and this incident will likely have ripple effects beyond the point of resolution. Days without being able to upload device IDs, or create development profiles. Schedules gone awry. Releases delayed. People will re-evaluate their own contingency plans and maybe question their app store strategy. Thousands of developers are being affected, and ultimately, this will affect Apple’s bottom line.

And that’s why this situation is not the kind of thing that you’ll let go on for this long unless there was a very, very good reason (only a couple of days from reporting quarterly results, no less). Maybe critical data was lost and they’re trying to rebuild it (what if everyone’s App IDs just went up in smoke?). Maybe it was a security breach (what if the root certs were compromised?). The likelihood that there will be consequences for developers, as opposed to just a return to the status quo, goes up with every hour that this continues. As Marco said: “[...]  if you’re an iOS or Mac App Store developer, I’d suggest leaving some free time in the schedule this week until we know what happened to the Developer Center.”

In fact, it could be that at least part of the delay has to do with coming up with procedures and documentation, if not a full-on PR strategy. Apple hasn’t traditionally behaved this way, but Tim Cook has managed things very differently than Steve Jobs on this regard.

Finally, I’ve been somewhat surprised by the lack of actual reporting on this. One day, maybe two days… but three? Nothing much aside from minor posts on a few websites, and not even much on the Apple-dedicated sites. This is where real reporting is necessary. Having sources that can speak to you about what’s going on. Part of the problem is that the eventual impact of this will be subtle, and modern media doesn’t do subtle very well. It’s less about the immediate impact or people out of a job than about a potential gap in future app releases. A whole industry is in fact dependent on what goes on with that little-known service, and with iOS 7/Mavericks being under NDA, Apple’s developer forums, which are also down, are the only place where you can discuss problems and file bug reports. Some developer, somewhere, is no doubt blocked from being able to do any work at all. 

Apple should, perhaps against its own instincts, try their best to explain what happened and how they’ve dealt with it. Otherwise, the feeling that this will just happen again will be hard to shake off for a lot of people. For Apple, this could be an opportunity to engage with their developer community more directly. Here’s hoping.

diego’s life lessons, part III

Excerpted from the upcoming book: “Diego’s life lessons: 99 tips for survival, fun, and profit in today’s baffling bric-a-brac world.” (see Part I and Part II).

#9 make the right career choices

Everyone will have seven careers in their lifetime, someone said once, and we all repeated it even if we have no idea why.

The key to career planning, though, is to keep in mind that while the world of today ranges from complicated to downright baffling, the world of tomorrow will be pretty predictable, since as we all know it will just be a barren hellscape populated by Zombies.

So the question is: post-Zombie Apocalypse, what will you need to be? Survival in the new Zombie-infested world will require the skills of any good D&D party: a Healer, a Warrior, a Thief, and a Wizard — which in a world without magic means someone to tinker with things, build weapons, design shelters with complicated spring traps, and knowledge of how to brew a good cup of coffee.

Clearly you don’t want to be a Healer (read: medic/doctor), since that means no one will be able to fix you — you should have friends or relatives with careers in medicine, however, for obvious reasons. Being a Thief will be of limited use, but more importantly it’s not really the kind of thing you can practice for without turning to a life of crime as defined by our pre-Zombie civilization (post-Zombies, most of the things we consider crimes today will become fairly acceptable somehow, so you may be able to pull this off with the right timing).

That leaves you with either Warrior or Wizard, which translates roughly to: Gun Nut or Hacker. And by “Hacker” we mean the early-1980s definition of hacker, rather than the bastardized 2000s version, and one that is not restricted to computers.

So. Your choices for a new career path are as follows:

  • If you’re a Nerd, become a Hacker.
  • If you’re neither a Nerd or a Hacker, just become a Gun Nut, it’s the easiest and fastest way to post-apocalyptic survival. This way, while you wait for Zombies to strike you won’t need to worry (for example) about a lookup being O(N) or not, or why the CPU on some random server is pegged at 99% without any incoming requests.
  • If you’re already a Gun Nut, you’re good to go. Just keep buying ammo.
  • If you’re already a Hacker… please don’t turn into an evil genius and destroy the world. Try taking up some activity that will consume your time for no reason, like playing The Elder Scrolls V: Skyrim or learning to program for Blackberry.

NOTE (I): If you’re in the medical profession, just stay put. We will protect you so you can fix our sprained ankles and such.
NOTE (II): there is also the rare combination of Hacker/Nerd+Gun Nut, but you should be aware that this is a highly volatile combination of skills which can have unpredictable results on your psyche.

#45: purchase a small island in the Pacific Ocean

As far as having a permanent vacation spot, this one really is a no-brainer. Why bother with hotels when you can own a piece of slowly sinking real estate? Plus, according to highly reliable sources, you don’t need to be a billionaire.

True, you will have significant coconut-maintenance fees and you’ll probably need a small fleet of Roombas to keep the place tidy, but coconuts are delicious and the Roombas can help in following lesson #18.

NOTE I: don’t be fooled by the “Pacific” part of “Pacific Ocean.” There’s nothing “pacific” about it. There’s storms, cyclones, tsunamis, giant garbage monsters, sharks, jellyfish, and any number of other dangers. Therefore, an important followup to purchase the island is to buy an airline for it. You know, to be able to get away quickly, just in case.

NOTE II: this is actually an alternative to the career choices described above, since it is well known that Zombies can’t swim.

NOTE III: the island should not be named Krakatoa — see lesson #1. Aside from this detail, owning a Pacific Island does not directly conflict with lesson #1, since the cupboard can be actually located in a hut somewhere in the island (multiple cupboard hiding spots are also advisable).

#86 Stock up on Kryptonite

Ok, so let me tell you about this guy… He wears a cape and tights. He frequently disrobes in public places. He makes a living writing for a newspaper with an owner that makes Rupert Murdoch look like Edward R. Murrow. He has deep psychological scars since he is the last survivor of a cataclysmic event that destroyed his civilization. He leads a secret double life, generally disappearing whenever something terrible happens. He is an illegal alien. Also, he is an ALIEN.

Does this look like someone trustworthy to you? Hm?

That’s right. This is not a stable person.

Add to the list that he can fly, even in space, stop bullets, has X-ray vision, can (possibly) travel back in time and is essentially indestructible. How is this guy not a threat to all of humanity?

Lex Luthor was deeply misunderstood — he could see all this, but his messaging was way off. Plus there were all those schemes to Take Over The World, which should really be left to experts like genetically engineered mice.

The only solution to this menace is to keep your own personal stash of Kryptonite. Keep most of it in a cupboard (see lesson #1) and a small amount on your person at all times.

After all, you never know when this madman will show up.

dialtones

When my home phone… you know, the bulky, heavy one, plugged in to a wireline (perhaps for sentimental reasons, at this point), rings… I don’t answer.

Ever.

It is muted. Permanently.

There’s a generation … a group of people, a dividing line, somewhere… for whom the idea of a dialtone, of verified communication, sounds insane. Most of them are kids at this point, sure, but some aren’t. To me, it is noticeable. To others, it is alien.

A dialtone.

Think about it, how many people alive today don’t know what a dialtone is? Have never heard one?

How many people do not answer their phone because they assume it’s spam?

Spam. Email… bits, translated into voice (also bits). Video. TV, or, truthfully, the constructs that TV (and to some degree radio) created.

Advertising.

Something to consider…

the reason behind windows phone’s dominance in some geographies

via daringfireball, Nick wingfield points to places in the world where Windows Phone is outselling iPhone. Gruber notes, correctly, that these are not Apple strongholds. Blackberry is also extremely popular in those geographies.

What is special about those places? Is it that they have some cultural quirk that prevents them from appreciating iOS?

No. It’s about exchange rates and import controls.

Imports to Argentina, for example, are effectively frozen. People can’t get all sorts of things, from books to electronics. Simple kitchen appliances are in some cases hard to come by. Anecdotally, I can say with some degree of certainty that people would love to get Apple products, and yet Apple products are in extremely short supply since the government denies import licenses unless you export the same amount. Car companies export grains so they can bring in cars. RIM set up a factory in the country just so they could sell phones (you can imagine Apple, given its size and scale, didn’t bother).

As reference, see this businessweek article:

After months of negotiations, [BMW] figured out a fix. The government agreed to let in BMW’s vehicles as long as the company’s Argentine subsidiary exported an equivalent amount of upholstery leather, car parts … processed rice. Echeagaray worked a deal with the Ministry of Industry to get the necessary import permits.

Russia and India are not exactly the same story but match shades of it. The exchange rate factor is a big issue too (more so in Russia and India than in Argentina) — cost of Apple products translates more directly in dollar terms, since they are manufactured in a few locations worldwide and then priced in dollar terms, as opposed to in the local manufacturing and pricing in local currencies. This makes them expensive. No doubt Apple is making a conscious decision here to avoid devaluing their products in real terms.

assume good intentions

A good friend once told me: “Assume good intentions.” Those three words have been hugely influential in my world view in the last few years. Once you make this idea explicit it can shape how you think about what others do in significant ways.

I was reading today about some of the brouhaha surrounding Lean In and the whole why-is-a-billionaire-woman-telling-women-everywhere-what-to-do thing and there was a reference for the launch of Circles.

Gina & Team: congratulations on the launch, it must have been a crazy effort and it looks great.

It seems it’s been building up for a while (the controversy around the book, that is) but I had not seen it until today when I read this article in The New Yorker.

Why I bring this up is that what keeps coming back to me in all of this is how our perspective in the Valley is sometimes clouded by second-hand opinions, innuendo, and gossip, for example around who got funded by whom or which idea is “in”. Yes, this is not unique to the Valley, but it happens frequently here and so I can attest to it, in my own backyard (so to speak… the actual inhabitants of my shared backyard are bluebirds and squirrels).

Putting yourself out there, through a book, art, or even, yes, software, is a hard thing to do. People misunderstand and misinterpret your intentions and motivations constantly, and the schadenfreude that is sadly all-too-common makes things even harder. But we are all just people, trying to do the best we can. The number of significant zeros in your bank account doesn’t change that in most cases. And I say that  having very few significant zeros left in my own bank account.

But, funny thing (not ha-ha funny), most of the people that have such strong opinions on these things have never done them. They “talk about the book” without having “read the book.” (You really need to read The New Yorker article to get this reference). Some of my brothers-in-arms work at Evernote, but do they get press and coverage when they “just” keep an awesome service/app running? No. They get press when someone breaks into their systems.

Controversy sells.

Don’t get me wrong: critics are good> But it’s a matter of degrees. I’m not saying you need to write a book to be able to critique a book, or that you need to start a company to be give your opinion on how ist should be run, but at the very least spend a moment and consider the effort involved. Avoid ad hominems. Forget about money for a second. Consider how much of their lives these people are sacrificing trying to do something.

Assume good intentions.

I bet that if you did that you’d find yourself a bit more forgiving of missteps, a bit more understanding, a bit more willing to believe.

And for those who are doing it, regardless of the scope or (apparent) size of your project, here’s something I could not say out loud because it would sound terrible given my accent… but I can write it: Gina, Sheryl, and all of you out there who are putting yourselves, your sanity, on the line for an idea: Give ‘em hell.

:-)

kindle paperwhite: good device, but beware the glow

For all fellow book nerds out there, we close the trilogy of kindle reviews for this year, now moving on to a look at Kindle Paperwhite, adding to the plain Kindle review and the Kindle Fire HD.

This device has gotten the most positive reviews we’ve seen this side of an Apple launch. I don’t think I’ve read a single negative review, and most of them are positively glowing with praise. A lot of it is well deserved. The device is light, fast, and the screen is quite good. The addition of light to the screen, which everyone seems bananas about, is also welcome, but there are issues with it that could be a problem depending on your preference (more on that in a bit).

A TOUCH BETTER

Touch response is better than the Kindle touch as well. There are enough minor issues with it that it’s not transparent as an interface — while reading, it’s still too easy to do something you didn’t intend to do (e.g. tap twice and skip ahead more than one page, or swipe improperly on the homescreen and end up opening a book instead of browsing, etc.) but it doesn’t happen so often that it gets in the way. Small annoyance.

Something I do often when reading books is highlight text and –occasionally– add notes for later collection/analysis/etc. Notes are a problem in both Kindles for different reasons (no keyboard in the first, slow-response touch keyboard in the second) but the Paperwhilte gets the edge I think. The Paperwhite is also better than the regular Kindle for selection in most cases (faster, by a mile), with two exceptions being that at the end of paragraphs it’s harder than it should be to avoid selecting part of the beginning of the next, and once you highlight a the text gets block-highlighted as opposed to underlined, which not only gets in the way of reading but also results in an ugly flash when the display refreshes as you flip pages. Small annoyances #2 and #3.

Overall though, during actual long-form reading sessions I’d say it works quite well. Its quirks appear of the kind that you can get used to, rather than those that you potentially can’t stand.

THE GLOW THAT THE GLOWING REVIEWS DIDN’T SPEND MUCH TIME ON

Speaking of things you potentially can’t stand, the Paperwhite has a flaw, minor to be sure, but visible: the light at the bottom of the screen generates weird negative glow, “hotspots” or a kind of blooming effect on the lower-screen area that can be, depending on lighting conditions, brightness, and your own preference, fairly annoying. Now, don’t get me wrong — sans light, this is the best eink screen I’ve ever seen, but the light is on by default and in part this is a big selling point of the device, so this deserves a bit more attention.

Some of the other reviews mention this either in passing or not at all, with the exception of Engadget where they focused on it (just slightly) beyond a cursory mention.

Pogue over at the NYT:

“At top brightness, it’s much brighter. More usefully, its lighting is far more even than the Nook’s, whose edge-mounted lamps can create subtle “hot spots” at the top and bottom of the page, sometimes spilling out from there. How much unevenness depends on how high you’ve turned up the light. But in the hot spots, the black letters of the text show less contrast.

The Kindle Paperwhite has hot spots, too, but only at the bottom edge, where the four low-power LED bulbs sit. (Amazon says that from there, the light is pumped out across the screen through a flattened fiber optic cable.) In the middle of the page, where the text is, the lighting is perfectly even: no low-contrast text areas.”

The Verge:

“There are some minor discrepancies towards the bottom of the screen (especially at lower light settings), but they weren’t nearly as distracting as what competitors offer.”

Engadget:

“Just in case you’re still unsure, give the Nook a tilt and you’ll see it clearly coming from beneath the bezel. Amazon, on the other hand, has managed to significantly reduce the gap between the bezel and the display. If you look for it, you can see the light source, but unless you peer closely, the light appears to be coming from all sides. Look carefully and you’ll also see spots at the bottom of the display — when on a white page, with the light turned up to full blast. Under those conditions, you might notice some unevenness toward to bottom. On the whole, however, the light distribution is far, far more even than on the GlowLight.”

So it seems clear that the Nook is worse (I haven’t tried it) but Engadget was the only one to show clear shots of the differences between them, although I don’t think their screenshots clearly show what’s going on. Let me add my own to that. Here’s three images:

 

The first is the screen in a relatively low-light environment at 75% screen brightness (photo taken with an iPhone 5, click on them to see them at higher res). The second two are the same image with different Photoshop filters applied to show more clearly what you can perhaps already see in the first image — those black blooming areas at the bottom of the screen, inching upwards.

The effect is slightly more visible with max brightness settings:

What is perhaps most disconcerting is that what is more visible is not the light but the lack of it — the black areas are what’s not as illuminated as the rest before the full effect of light distribution across the display takes place.

Being used to the previous Kindles, when I first turned it on my immediate reaction was to think that I’d gotten a bad unit, especially because this issue hadn’t been something that reviews had put much emphasis on, or seemed to dismiss altogether, but it seems that’s how it is. Maybe it is one of those things that you usually don’t notice but, when you do, you can’t help but notice.

So the question is — does it get in the way? After reading on it for hours I think it’s fair to say that it fades into the background and you don’t really notice it much, but I still kept seeing it, every once in a while, and when I did it would bother me. I don’t know if over time the annoyance –or the effect– will fade, but I’d definitely recommend you try to see it in a store if you can.

THE REST

Weight-wise, while heavier than the regular Kindle, the Paperwhite seems to strike a good balance. You can hold it comfortably on one hand for extended periods of time, and immerse in whatever you’re reading. Speaking of holding it — the material of the bezel is more of a fingerprint magnet than previous Kindles, for some reason, and I find myself cleaning it more often than I’ve done with the others.

The original touch was ok but I still ended up using the lower-end Kindle for regular reading. If I can get over the screen issue, the Paperwhite may be the one touch e-reader to break that cycle. Time will tell.

short answer yes with an if, long answer, no, with a but…

Part 3 of a series (Part 1, Part 2)

HERE WE GO AGAIN

I will look at this from one more angle and then I will let it rest here for future reference, since pretty much everyone else seems, not surprisingly, to have moved on. With the aside on how we go about discussing this topic out of the way (and various other digressions) in my post last Sunday, I wanted to focus a bit on what is perhaps the center of the argument used in the Times article. At the very least, elaborating on the flaws the center should get us very close to exposing the feebleness of the rest of the argument’s construction.

At the core of the argument is the following paragraph:

Energy efficiency varies widely from company to company. But at the request of The Times, the consulting firm McKinsey & Company analyzed energy use by data centers and found that, on average, they were using only 6 percent to 12 percent of the electricity powering their servers to perform computations. The rest was essentially used to keep servers idling and ready in case of a surge in activity that could slow or crash their operations.

In my response I took issue with this paragraph in two specific areas: 1) that an average is meaningless without more information (e.g. the standard deviation, for starters) and 2) that the measure of utilization they imply, and I say imply because they never make it clear –another flaw–, was one of “performing computations.” I elaborated a bit on the various types of tasks servers may be performing and noted that you couldn’t amalgamate them all into a single value. This is true, but I want to step back a bit into what utilization means and the unmentioned factor that is on the other side of it: efficiency.

WHERE IT BECOMES CLEAR THAT TERMINOLOGY MATTERS (A LOT)

Semantics time: we need to define some terms. Let’s say, just for a moment, to simplify the discussion a bit, that we’re ok talking about utilization as some kind of aggregate. Let’s say that we ignore the specifics of what the servers are doing and we further assume that we will use a percentage of utilization of the “system,” to some percent between 0 and 100, with zero being the system is doing nothing but its own minimal housekeeping tasks, so zero isn’t really zero, but we’re simplifying, and 100 being the system is fully utilized doing something specifically related to the application at hand. I should start though, with some terminology housekeeping by defining what “a system” is.

Definition 0: I will use the words system and machine interchangeably to mean a particular piece of hardware or virtual instance, typically a server, running a particular piece of software. Mixing up virtual systems and machines with actual hardware is a bit of a shortcut, but since in practice virtual systems must eventually map to a real one, it’s one that I think we can live with. (Another shortcut lies in “typically a server” since, say, network switches should also be part of the equation.)

Definition 1: Utilization: a percentage between 0 and 100 of system load related to the specific application tasks at hand.

I shudder at the oversimplification, but I’ll get over it. Probably.

Now, related to utilization is efficiency. While utilization can be said to be an objective concept that is measurable, efficiency can only really be understood relative to something else, and in the case of the a piece of software as relative to previous versions itself. So for example we can’t really say anything reasonable about the efficiency of a V1 piece of software except with respect to imagined possible changes in the future. Conversely, the efficiency of V2, V3, etc will be defined in relation to whatever version or versions preceded them. Now that we have the definition of utilization, though, we can talk about efficiency in terms of that. So, for example, if V2 uses half the systems that V1 used, then it’s twice as efficient (2x). Or V2 could require 20 machines while V1 required 10, in which case V2 is half as efficient.

Definition 2: Efficiency: the change in utilization between two versions of the system.

Once again — for all data center people out there, I’m oversimplifying for the sake of this particular argument.

In the paragraph quoted from the Times above there’s a bit of a jumble of terms. It starts talking about “energy efficiency” (which, in our definition, would be 100%-Utilization%) and then it talks about using “they were using only 6 percent to 12 percent” which is straight utilization%. I think playing fast and loose with terminology like that can get in the way of really knowing what we’re talking about, which is why I’ve spent some time defining, at least, what I’m talking about.

INSERT OBVIOUS TRANSITIONAL SECTION TITLE HERE

Ok, fine, the squirrel in charge of coming up with section titles is on a break and I can’t think of a good one, so let’s just keep going now that we’re armed with these terms. I’ve repeatedly stated that the average utilization, by itself, is meaningless. Allow me to elaborate on one key reason why. Suppose you measure ten systems and get the following utilization values: 10, 10, 10, 10, 10, 90, 90, 90, 90, 90. This gives you an average utilization of 50% — already something to note, since the average clearly isn’t telling the whole story. Assuming for a moment that these values are comparable across systems (a big if, but again: simplify!), the average is really hiding something important: five systems have “bad” utilization (10%) while the other five have “good” utilization (90%).

But wait, why am I using quotes around “good” and “bad”? Because this is something the article implies: that more utilization is better, but this is exactly why talking about efficiency matters. Maybe, just maybe, the five systems at 10% are actually systems with software that do the same thing but more efficiently, and from that perspective our assessment of good or bad can get inverted.

Suppose the five systems at 90% are part of a cluster. Suppose one of the five system crashes. Suddenly the load that went to that system has to be distributed to all the others and it quickly puts the remaining four systems at over 100% and the entire cluster could crash. The load would never get distributed perfectly across systems among other problems — in reality we’d probably looking at a cascade of failures as each individual remaining system crosses the 100% threshold, putting more load on others, and so forth, ending with the final state where the entire cluster is down and everyone, from the CEO to your users, is screaming bloody murder.

Suddenly we’re looking at those systems at 90% with some suspicion, no? In fact if you keep on the simplification vibe, you could argue that for a five-system cluster the only way to protect it from any one machine going down (assuming a crash at 100% load, also not a given…) is to maintain the average utilization at around 75% or so, which in the event of one system crashing would leave the remaining four at 93.75%. 80% average would mean one system crashing leaves everyone at 100%. So the difference between 75% and 80%, which seems minimal, is the difference between life and death in this scenario.

But I said can get inverted because there’s other factors at play. Take just one: uptime requirements. Suppose you’re somehow OK with the idea that one system crashing takes everything down, and you’re willing to trade uptime for costs (ie., using only five machines). Then you’d be ok, probably. Everyone I know, however, would want to protect from this eventuality.

This scenario isn’t contrived to get the answer to come up the way I want. It is typical, certainly far, far more common than a simple and straightforward DC setup components get swapped instantly, you’re always at maximum efficiency, and every system does the same thing. Reality isn’t like that. Speaking of which…

A (SMALL) DOSE OF REALITY

Let’s complicate things a bit further, or rather, make them just slightly more realistic. Let’s say that your 90% utilization was measured during a time where you had 100 simultaneous users on average. Tomorrow, though, Pando Daily posts a glowing review of your website and usage doubles. Doesn’t sound too far-fetched. Now what? The 90% utilization, if there’s a strong correlation between simultaneous users and load (which is typical), is suddenly guaranteed to bring you down. Suddenly you’d need much lower baseline utilization to be able to handle that spike, and the 90% looks like a bad idea once again.

Going a bit further, imagine the two clusters of five systems are actually performing the same function, but one of them is a V1 (90%) and the other one is the more efficient version you just deployed (10%). This is something that happens all the time. As you can instrument a piece of software running under real-world load, you can understand better what leads to that load, you can optimize your software, and sometimes massive jumps in efficiency are common. So when the 90% is deemed as a problem, the team gets to work, and they come up with a V2 that is around an order of magnitude more efficient.

Which leaves you with two possible versions of the software doing the same work, but one uses 10% of the systems and the other 90%. Which one is better? I think everyone would agree that the newer, more efficient system is better, even though if you deployed it, it would suddenly make your utilization plummets across the board, which, according to the article, is “bad”. Oops.

Hold on, I hear you say. Now that you’ve got software that is more efficient why don’t you just decommission the extra systems you don’t need? Then you’d be using less power overall and you could cut down, say, from 10 machines at 90% to 5 machines at 20%, which should give you a nice margin for error.

Aha! This surely sounds true, but this is also where the oversimplifications I keep bashing get tricky since real world gets in the way. Two interrelated points. First, there aren’t that many (if any) tidy switchovers from V1 to V2. In increasing efficiency you may have introduced bugs. To protect against that, you start having to test (more machines for that!), and even when testing says it’s ok to deploy you will start small — deploy to a few machines only, wait, verify. Deploy a bit more. Wait, verify.

THE MISSING VARIABLE: TIME

The process we just described has an important variable that we haven’t looked at so far: time. As in, in the process of doing this, time passes.

Sounds obvious right? But if this is true, it’s also true that as time passes, requirements change. Requirements, here, encapsulating all the factors, external and internal, that go into delivering your service or product. Whatever you’re doing, you’re not dealing with a static entity, but something that is constantly evolving, both because you’re changing it from inside as you update the software, fix bugs, evolve architecture, and deploy new hardware, but also because the external factors are constantly in flux. Likely, you will now have more people using the systems (or, for things that are just APIs, maybe more machines). Or load may have changed due to a feature.

The passage of time here is the critical element missing, and it gets in the way of the ideal scenario in which we allow the system to truly “contract” and use fewer resources in terms of power. By the time you’re sure you can decommission those extra machines, it’s quite possible that you now have other uses for them, and even if you did find a sliver of time to do this you may not have the option since system growth, or feature changes, or whatever, may be telling you that you will need those extra machines in a week or a month month, and therefore taking them down would be simply a waste of time. At Ning, for example, our overall hardware footprint remained largely stable in terms of number of machines, and therefore total power consumed, even as we went from 1 MM registered users to 50 MM and beyond. Setting aside the fact that this took an enormous amount of work and constant vigilance, it also meant that the system that could handle 50 MM users with the same amount of hardware as for 1MM was very different than the original. And throughout that process, many machines would oscillate in their degree of utilization, from the low teens to the high 70s. Over a period of a few years you could take measurements at different points in time that would either make us appear either like geniuses or criminally stupid — if all you looked at was utilization, and if “less utilization is bad” was your only guiding principle.

If we can agree that low or high utilization alone is meaningless, and that your utilization will fluctuate, maybe drastically, as you optimize, we can start to ask more appropriate questions. For example: can we do better in terms of releasing idle capacity as efficiency increases? Absolutely, and the widespread use of virtualization in recent years also means that it’s now far easier to have capacity fluctuate along with load. The APIs for system management popularized by public cloud infrastructure companies (e.g. EC2, Rackspace cloud, etc.) have led in the last couple of years to more and more services where capacity is instantiated on demand according to load, leading to a more efficient use of those virtualized resources.

Even there, though, we have a problem, for if EC2 allows you to instantiate and tear down a hundred AMIs without giving it a second thought, it’s also necessarily true that there must be an actual hardware footprint doing nothing but waiting around for that to happen. Having people constantly disconnect and reconnect machines is something that is not just not feasible, and that in almost every case will involve as much waste as taking a system down, since in the world we live in the capacity we need to run the ever-increasing complexity of Internet infrastructure keeps going up, which means that whatever you stop using today, you’re guaranteed to need tomorrow. With 60% or more of humans the planet still not online, there’s clearly still a lot of growth left. For the biggest services, it’s common to have to constantly be deploying new capacity just to account for growth.

This leads us to yet another question, perhaps the last one in this particular chain of thought: Should we in general accept less reliability from online services given that it has real environmental impact, and real cost?

My own answer to this would be a clear NO. I don’t think you can have these systems simultaneously be part of the fabric of society (a point I’ve made before) and have them be “partially reliable,” just like there’s no way to be “partially pregnant.” Reliability is intrinsically tied to the usefulness of these services. Perhaps there are ways in which we can bake in more asynchronous behavior in some cases, but when a lot of what systems do is real-time, 24/7/365 and worldwide, this isn’t something we’ll be able to exploit frequently. We have crossed the Rubicon, so to speak, and have to see this through.

THE CONCLUSION, OR, BELATEDLY MAKING SENSE OF THE TITLE OF THIS POST

Utilization in data centers is an important issue, but talking about it bereft of context is not really that useful. In particular, without also talking about efficiency, and all the parameters that go into it including what kinds of applications are running, what the goals are, what the requirements are, etc., is going to leave us with nothing but incomplete answers, and here incomplete will leave us way too close to incorrect for comfort.

And this isn’t just about the Internet. Is it a valid question to talk about utilization in, say, TV stations? Or other major source of media for that matter? Pre-digital music distribution…?. Sure it is, but it’s not something we focus on because the use of energy in other media is a one or two steps removed from what we see, so it’s easier to ignore even if it’s there all the same. When was the last time you remember a TV station couldn’t broadcast? Are we to believe that they never had a power failure? No. They have backup systems. Those evil-sounding lead batteries or diesel generators.

Context.

By looking at context and simply shifting the assumption that Internet infrastructure is actually run fairly well, given all the requirements and its rapid evolution, we realize that what we really should be wondering about what makes us build systems this way, rather assuming they are not built properly. Should we talk about utilization? Or is this really about what drives it, and therefore utilization is part of the discussion but not the central point?

Channeling Reverend Lovejoy for a moment, we could say, then: “Short answer yes with an if, long answer, no, with a but…”

santa claus conquers the martians

Part 2 of a series (Part 1Part 3)

PRELUDE

I’ve had a busy week, and have been trying to sit down and put together a followup to my response to the NYT’s article on data centers.

I write the title, and I soon as I do, my mind goes blank. I read the title again. What the hell was I thinking? I am looking at the screen, white space extends below the blinking cursor, mirrored by something somehow stuck in my head, alternating on/off, rumbling lowly like an idling engine: I swear I had a point.

So naturally I start to think that this, perhaps, should be the new title. Which, in the expected recursion path that would follow naturally ends up in another meta-commentary paragraph (also with a simile close to its ending), which I decide not to write. Recursion upwards, probably to conform with an implicit image of happiness we may or may not feel (or is in this case is really quite unwarranted and even more, even worse: unnecessary) but we should generally imply anyway, because these days if you’re not explicitly happy something must be wrong, and therefore it must be fixed. Neutral has become a bad state to be in, apparently, long after being “with us or against us” became a common way to think about nearly everything. No, recursion has no direction except, perhaps, into itself, but it now occurs to me that years of looking at function call stacks have trained me (hopelessly comes to mind, but that’s also not happy) to think of recursion as up or down, rather than, say, horizontally from right to left.

Fascinating, I know.

– oOo –

I will eventually get to Santa Claus and the Martians, but for the moment, back to the article.

The series was titled “The Cloud Factories”, and right there it broadcast ever-so-subtly that it was to be something intended to get worked up about.

“Factory” can mean “the seat of some kind of production” but in this case the weight of the word is in the manufacturing angle. This doesn’t quite feel right, though. A factory is where things are built, sequentially, or at least mostly sequentially, and a cloud is anything but built, and the process is anything but sequential. A cloud emerges, and if we switch to the definite article and the proper noun with all its implications and uppercaseness, it’s also true that The Cloud is an emergent phenomenon. Metaphors are often misapplied, can be incorrect, but it’s not that often that a metaphor involving an overloaded term (“cloud”) is both misapplied and incorrect in the exact same way for nearly all the meanings of the term. This takes some skill.

So, yeah, the point of the title of the series was not to be accurate as an analogy, but to evoke. Specifically, an image. Much like the factory in which they make Itchy & Scratchy cartoons in The Simpsons has chimneys and dark dense smoke coming out of them, as does every factory in The Simpsons, regardless what it’s for. The “factories” in the “The Cloud Factories” seem to intentionally or not (but can this really be unintentional?) transmit the idea of dirt we associate at a reptilian level with “factory”. Dirt. Pollution. Guilt by association. Then — the title of the article, the first of two so far, drops the subtle imagery: “Power, Pollution and the Internet.” Strangely enough, beyond the title the word “pollution” appears exactly once in the entire article.

Pollution and the Internet. How could one not react to that? What I wrote a week ago was pure reaction, if nothing else to the reactionary tone of the article, but by now I have accumulated enough in my head to maybe add something else to this topic, which, perhaps predictably, has a bit less to do with the contents of the article itself (not that that topic is exhausted by any means) but on what is one possible way to look at its main thrust through the lens of discourse on technology nowadays, how we use metaphors and analogies to convey something that we haven’t yet internalized, and the factors at play in sustaining a reasonable and reasonably deep conversation in an environment that doesn’t lend itself to that. And if all of this in retrospect looks obvious, consider this the admittedly convoluted way in which I am creating a reminder, a mental note: something to pay more attention to.

On to it, then.

ACTION REACTION RETRACTION

Action — argument (paraphrasing, summarizing): “That which powers our online services and more generally the Internet is really a hidden pollution machine run by people fearful of reducing waste, even though the means to do so are readily available.”

Reaction — counterargument (now really summarizing): “Not true.”

That the argument isn’t true may be indeed true, and yet to not just agree with the counterargument because, for example, you respect whoever made it but to understand it requires a degree of experience and training and knowledge that is well beyond what most people could get to because, quite simply, they have their own jobs and lives. Indeed, if it’s not your job and it’s not your life (and for most of the people for whom this is a job, it’s also our life), you really shouldn’t bother. The modern world, and to some degree the very basis of our progress is that we use things that we can’t build, and in many cases can’t even understand. We travel by plane even though many people have no idea how it works, let alone are able to build one.

And that is just fine.

We trust the plane, though, don’t we? Well, now we do, but 150 years ago the thought that you could pack tons and tons of baggage and instruments and hundreds of people into a tin can and by pushing air at unimaginable speed through smaller tin cans attached to the larger tin can with bolts you would get the thing to fly was unpopular indeed.

Bear with me for a minute here. I’m getting somewhere. Promise.

As I was writing a week ago I was typing frantically and in the process of switching windows I entered “action reaction retraction” into Google, and the last result visible before I had to scroll said “Robert H. Goddard. The New York Times.” which seemed intriguing enough, and following there were notes on a retraction that seemed almost too appropriate. Really? was the thought, so I went to the Times archives and found the quote, but in the process lost the bizarre way in which I stumbled on to it. I spent almost an hour yesterday, I kid you not, going through the browser’s history to see what I’d done, and I still can’t remember why I was typing that except to think that I must have read this before, and further googling just for the quote shows that it’s been mentioned a few times in the last several years. Sarah Lacy included the quote in her followup, along with her own thoughts regarding an earlier Times story on Tesla motors which shows if not a pattern at least some concordance of mistakes all going in the same direction, or misdirection.

The quote was a retraction from the Times in which it acknowledges:

“Further investigation and experimentation have confirmed the findings of Isaac Newton in the 17th century and it is now definitely established that a rocket can function in a vacuum as well as in an atmosphere. The Times regrets the error.”

This was triggered by Apollo 11′s flight, when, one presumes, a 50-year-old takedown of rocket pioneer Robert Goddard on the very pages of the Times might have come to their attention:

“That Professor Goddard, with his ‘chair’ in Clark College and the countenancing of the Smithsonian Institution, does not know the relation of action to reaction, and the need to have something better than a vacuum against which to react — to say that would be absurd. Of course he only seems to lack the knowledge ladled out daily in high schools.”

The Times regrets the error. This reminded me of what we could call the case of Catholic Church v. Galileo. At least the Vatican actually apologized to Galileo directly, although in fairness to the Times, it took the Vatican closer to 400 years to get to that point.

The reason I bring up the quote again is that there’s a certain tone of mischief detectable in it, since no one can possibly believe that they are seriously a) realizing just now that rockets actually work in a vacuum and b) that the way to correct for this is to say that “this confirms the findings of Isaac Newton.” Points for whoever wrote it: it was funny.

And just to be clear: this isn’t about giving a pass to the Times, but to try to figure out why this seems to be a recurring problem from which the Times seems far from exempt, even when we may be inclined to think they are exempt from it.

The question is, then, why would they, the nebulous they but that nevertheless is actually people, talented as they may be, would have originally thought that trashing Goddard, someone with enough credentials to presumably give him the benefit of the doubt at the very least, was a good idea?

Perhaps because in doing that they were reflecting, ahem, the times — the prevailing sense of what was or wasn’t possible in the age. The “truth” as they saw it, because truth and facts are two different things. To top it off, in this particular case a giant rocket traveling at some 11,000 meters per second was, as an undeniable fact, still very much in the future, but when the rocket was actually up there, actually carrying three people and countless gizmos and measuring devices and chemicals of all kinds, you didn’t have to know anything about physics to realize that there was something to this seemingly crazy idea of rockets in space after all.

Back to trusting airplanes at last: We trust the plane because we see it. We feel, down to our bones, the effort of the engines as it takes off and lands. If someone started to argue that the typical turbine was somewhat wasteful, I don’t think I’d be alone in thinking Well, while I’m inside the plane and on the air, I’d prefer a little waste to not being, you know, alive.

So is there something to the idea that, in the popular imagination, not seeing is disbelieving, to invert the well-known dictum?

More importantly, given the complexity and sheer scale of the systems involved in running the Internet, what would it take to “see” when what we’re talking about can’t, ever, actually be seen?

…AND YOU ALWAYS FEAR… WHAT YOU DON’T UNDERSTAND…

That’s a line from Batman Begins uttered by Mafia mob boss Carmine Falcone while he is explaining to a young Bruce Wayne why he should just stop acting all flustered about crime and go home. It’s a critical line not only in the film but in the overall story arc of the trilogy, since within it we find Bruce Wayne’s drive to become Batman. Bruce agrees with Falcone’s thesis but not his solution, decides to understand, disappears into the underworld, then returns, seven years later, as Batman.

Understanding — not fearing — takes knowledge, and knowledge takes a long time and effort to develop.

Convincing people that flying rockets in space “only” required that we actually fly a rocket in space. What would be the equivalent for getting people to accept that how data centers work is not some perennial waste, where secret gerbils run mindlessly within wheels, most of the time doing nothing at all, wasting energy and in the process laying waste to the planet as well? Well, one way would surely be to getting everyone to spend the equivalent of Wayne’s “seven years in the underworld” which in this case would be not only getting a degree in computer science but spend a good amount of time down in the trenches, seeing firsthand how these things are actually run.

That this is an impractical solution, since we can’t have the whole planet get a CS degree or work in a data center, is obvious. It leaves us with the alternative of using analogies and metaphors to express what people still haven’t internalized, and probably will never be able to internalize, in the way that they have the concept of a rocket or an airplane. Before planes flew, the idea of them also had to be wrapped in analogies and metaphors, usually involving birds. The concept of a factory would have undoubtedly required some heavy analogies to be explained to people in, say, the 16th century. We grasp at something that is known to make the unknown intelligible.

The analogies we choose matter, however. A lot. Which is why I keep talking about planes not factories. A modern commercial jet is a much more apt analogy for the type of “waste” involved in running a modern data center.

There is waste and pollution involved in running a jet, as anyone can plainly see. Sometimes the waste is obvious (empty seats), sometimes it’s not (unnecessary circuitry), but generally people don’t doubt that the good people at Boeing et. al. are always doing their damnedest to make the plane as efficient, safe, and effective as possible. The same is true of Internet infrastructure.

WHERE WE FINALLY GET BACK TO SANTA CLAUS CONQUERING MARTIANS

You may or may not agree with the plane analogy, there may be better ones, there are more things to discuss and there certainly is a need for us in the industry to engage more broadly and try to explain what’s going on as long as everyone in the world doesn’t have a CS degree (a man can dream).

So for all the faults I could find with the article, I think it was good that it triggered the conversation, and herein lies our second conundrum.

This “conversation” — it will require effort to be carried out.

A brief detour: reading Days of Rage a commentary in the latest issue of The New Yorker, which references Santa Claus Conquers The Martians while talking about the “Muslim Rage” of recent days over a YouTube video no one had actually seen, certainly not before the protests. I agree with a lot of the article, except on one point:

“The uproar over “Innocence of Muslims” matters not because of the deep pathologies it has supposedly laid bare but because of the way the film went viral.”

Psy and Gangnam Style was viral. This video wasn’t. If anything, from what we know, it seems to be quite the oppositeof viral, since apparently it was simply an excuse used by people in power to rile up the unhappy (there’s that word again) masses so they could have something to do: “Angry? Unemployed? Bored? Feel you have no future? Here, go burn an embassy.” And how irrationally angry you have to be to somehow find that looting and burning and killing either solves a problem or makes up for anything or is even, just, a remotely justified way to react. How displaced you have to be from yourself and disconnected to what surrounds you. I can hypothesize, only. At points in my life I’ve had little or no money but never felt in a way that would ever lead me to react in that way. Not that this is about money, I know, it’s just one of the factors (probably), but one that I can try to relate through. But I digress.

SCCTM is indeed an actual movie and the reason I bring it up is that I had seen it years ago in an MST3K episode, and when remembering that it occurred to me that what happened in the Middle East was a more, perhaps the most, extreme version of a pervasive phenomenon, that of reacting to what our perception is of something rather than to the thing itself.

Mind you, this isn’t one of those “things were better in my time” type of arguments. While there was a time decades ago when in-depth roundtables in media were more common fare, this happened in an environment in which the amount of raw data to process was far, far less than it is now. We are overwhelmed by data but lacking in information. This isn’t a matter of access to technology, either. I’d bet a lot of the people doing the burning and killing in Benghazi had cellphones. We all do.

This, deep in the weeds of this post (essay?), is what triggered the topic in my head. The end of the chain of associations: that what we’re often doing these days to handle all the information that we’re exposed to would be tantamount to MST3K dispensing with the actual viewing of the movie and simply skipping to the part where we make fun of it. It wouldn’t be the same, would it? Context is critical, but we react in soundbites and generate storms of controversy over a few words which can’t possibly have context attached, because there’s simply no space for it, anywhere.

Twitter and to some degree Facebook are often blamed, unfairly I think, with a supposed devolution of our society into people trapping their thoughts into contextless cages 140-characters in size. I don’t think there’s any question, though, that we humans are and have always been lazy if we can get away with it, and that the deluge of information leave us with little time to reflect on it, so the mind recoils and defends itself with quips and short bursts, and Twitter (and Facebook) are a good mechanism for that. It just so happens that this constant jumping around topics superficially is both a) effective as a dopamine release mechanism –read: addictive– and b) the perfect way of thinking of yourself as informed and on top of everything and yet truly involved in nothing. Why isn’t Twitter or Facebook to blame, then. Let me give you a Twitterless example: sad advertisement on TV, people starving, a catastrophe somewhere. Text a number and give $3. Done. Back to watching Jersey Shore, or 60 minutes, or whatever.

Twitter, Facebook, all of them, are not the proximate cause. They are an effect. A reaction.

The environment we live in has fundamentally changed because there is readily available, quite simply, more data about everything, a large part of which is a barrage of trivia and gossip — which is to be expected since they are, ahem, trivial to generate. If Lindsay Lohan having a traffic accident is enough to generate massive news coverage and the cascade of reaction that follows, topics that are deeper and more complex and are more difficult to grasp will find it hard to compete.

It’s something new, or relatively new in historical terms, and I don’t think we know how to handle this deluge yet. We are drinking from a seemingly limitless flood of information but we haven’t yet figured out how to close the faucet every once in a while. We don’t necessarily drown in it but this flood that is constantly rushing around us leaves us with no time to reflect on any one point.

Information overload! Pfft. This isn’t a new idea! I bring it up not only because I think that we are increasingly using (creating) media that is suited to how we are trying to deal with it, and the edifice we construct with all of it is not well-optimized to transmit complex ideas (this, also, is not at all original), and so it seems critical that we have to work hard at finding the right metaphors and analogies, the right tools to talk about how the machinery of the Internet works. Tools and machinery, here, somewhat ironically encapsulating the point.

AND NOW FOR THE SURPRISINGLY SUCCINCT CONCLUSION

Analogies matter, metaphors matter, and we need to find better ones to talk about what the Internet is (for example, a “global village” it is not, and this term has luckily fallen by the wayside, but the many reasons why will have to wait for another time). We also have to contend with a shifting media environment in which a conversation like this can get all too easily lost in the noise, not because, as a cynical interpretation would have it, people only care about Snooki or the Kardashians or whatever, but because until we figure out how to live and engage with complexity when soaking in data there will only only surface and precious little depth.

And if there’s an additional meta-point to “Power, Pollution and the Internet,” something else that is important beyond the specifics in the article, it is that we as an industry have left a void that can be filled with anything, and if we don’t engage and try make what we do more comprehensible for everyone who, rightly, doesn’t have the time to understand it because they’re busy running the rest of the world, then we in the industry have no one to answer to for it but ourselves.

Part 2 of a series (Part 1Part 3)

a shocking new way to get google maps on iOS 6

  • Step 1: Visit maps.google.com.
  • Step 2: (optional) save shortcut to homescreen.

Hm.

PS: Yes, defaults matter, but the native app was never that much better than the web app.

Follow

Get every new post delivered to your Inbox.

Join 367 other followers

%d bloggers like this: