diego's weblog

there and back again

Monthly Archives: April 2008

microhoo: what’s next

As we wait with baited breath for the next move in the Microsoft/Yahoo saga, Marc just posted a must-read entry on what happens if microsoft goes fully hostile. Meanwhile techcrunch has some speculation.

Another interesting question is what exactly is Microsoft buying with this, given that many people would leave, many have already left, and there’s nearly complete overlap on their technologies and products (see here) and in most cases full integration (rather than an orderly migration) would be a nightmare that anyone in their right mind would avoid. The easiest would be to just point yahoo.com to live.com, automatically migrate accounts, and you’re done. Well, of course, not really, but you get what I’m saying.

If so, this would be the most expensive domain name acquisition ever. Business.com for $7.5 million during the bubble? Peanuts, I say. πŸ™‚

Professor Knuth gets many right, at least one very, very wrong

An interesting interview with Professor Donald Knuth (via slashdot, where the slashdotter writes: “[Knuth] pitches his idea of “literate programming” which I must admit I’ve never heard of but find it intriguing” — really? never “heard” of that? I think it was one of the first things in programming I “heard” of… anyway…). Some interesting comments about various topics, including TeX, which continues to be nearly irreplaceable in some areas (academic papers in particular), but then drops a bomb:

I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the “Titanium” approach that was supposed to be so terrific–until it turned out that the wished-for compilers were basically impossible to write.
Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX.

How many programmers do you know who are enthusiastic about these promised machines of the future?

I hear almost nothing but grief from software people, although the hardware folks in our department assure me that I’m wrong.

I know that important applications for parallelism exist–rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose

First, his mistake of calling Itanium “Titanium” is a clue here. He hasn’t kept up with the field. He’s mixing up VLIW, which is what Itanium was based on, with multicore. VLIW is really designed around superpipelining, self-draining pipelines, and other features in processors and may have been left in the dust by Moore’s Law and the increased used of Virtual Machines, just as RISC has largely been sidelined. Additionally, many VLIW ideas have found their way into CISC processors. In any case, VLIW is about optimizing flow within a single processor, not many separate processors.

Second, his statement

Surely, for example, multiple processors are no help to TeX.

Is just plain wrong. TeX works by building up pages based on text blocks. A character is a block, which then gets built up into a word, then into a sentence, then to paragraphs. Paragraphs are then put together in pages. There would be a significant advantage to processing the units in different processors and then have them added up later. Granted, the sentences and paragraphs are related to each other and you can have one affect the other, but that means it would be harder to write multithreaded TeX processing, rather than multithreaded being no help at all.

“I hear almost nothing but grief from software people,” Knuth says, but this is just because we haven’t yet found clean, effective methods to write, test and debug multithreaded software at large scale. This doesn’t mean it’s bad, it just means we haven’t figured out how to use it properly. And, worst case scenario, you can always write your software as services connected through a thin layer of communication (based on anything from IPC to RMI to REST) that can then run as multiple processes happily on a multicore machine.

He also rues that literate programming hasn’t been embraced by the millions, but this is also only partly true. Java and other languages have been using Literate Programming concepts for years, and it has been a major advantage in productivity when using good IDEs like IDEA, Netbeans, or Eclipse.

Not that this invalidates anything that Knuth has done. πŸ™‚ The TeXbook and the Art of of Computer Programming are still gems that I end up perusing frequently. They may not be multicore-enabled, but they’re still as relevant as ever.

ning news: fast company article, new round, new releases

A few Ning networks, viewed as graphs

A couple of Ning-related news/articles hit the interwebs in the last couple of days — first, there was a Fast Company article, Ning’s Infinite Ambition, that covered viral loops and what is behind a lot of the recent high-growth Internet sites, and it’s definitely worth a read.

Then today VentureBeat broke the news of our recent investment round. Marc has more details. As he says, VentureBeat found a mandatory SEC filing and put out news that we’d otherwise not have talked about, since there isn’t a clear reason to be doing it. The reasons for the round, as Marc says:

We raised the money to enable us to keep scaling given our accelerating growth (over 230,000 networks on Ning now, growing at over 1,000 per day) and to make sure we have plenty of firepower to survive the oncoming nuclear winter. At current growth rates, we don’t need it to get to cash flow positive, but having lived through the last crunch, it’s good to be conservative with these things.

Meanwhile, the rolling train of releases continues apace — most recently with Events and Notes, two features that were very well received. It’s really great to see the instant reaction from users as they give instant feedback on both the good and the bad. Meanwhile, the Developer Network keeps growing, with a recent reorganization of docs to improve navigation — we’ll be doing a lot more that in the coming months, as we have been in the past months.

A lot done, still more to do. πŸ™‚


RRD4J. 100% Pure Java Implementation of RRDTool. Very cool. ‘xxx4j’ names were old in 1998. Java is already one of the main systems languages. No need to ‘4j-ify’ everything.

summer movie prediction: rockin’!

Sometimes it’s hard not to wish that Hollywood would get its act together and produce some actual entertainment instead of drivel, or rather, not hand it out just one movie a year at a time. This year definitely looks like an outlier. The list so far:

  • Indiana Jones and the Kingdom of the Crystal Skull. Come on, admit it. You’ve been waiting for this for the last ten years. Yeah. Me too.
  • Iron Man. Robert Downey Jr., one of the best actors of his generation, and what looks like a killer action movie.
  • The Dark Knight. Christian Bale, Heath Ledger, Gary Oldman, Michael Caine, once more directed by Christopher Nolan. After the spectacularly good Batman Begins, I have little doubt this one will be the one that finally, finally, let us Batman fans forget about what Joel Schumacher did to the character with Batman and Robin.
  • Wall-e. The latest from Pixar. I’d be surprised if the movie isn’t as hilarious as the trailer. Fun to watch for the CG tricks alone.
  • The Incredible Hulk. Second try at getting the big green man on the big screen. The first try wasn’t great but it had some good moments. This time Bruce Banner is Ed Norton, which can summon enough intensity to turn green all by himself (just remember American History X). High hopes for this one.
  • Righteous Kill. Al Pacino and Robert DeNiro. Need I say more?
  • Forbidden Kingdom. Jackie Chan and Jet Li go Kung-Fuing. Much fun is to be had by all.
  • Get Smart. Part of what was funny about the TV series was the crappy faux-James Bond tech, but this time it looks like the effects make it perhaps a little too slick. Steve Carell definitely looks like the natural heir for the part.
  • Bangkok Dangerous. Nick Cage tries the long-hair-weirdo-distanced-from-society thing again (last time it was Next), as a professional assassin who apparently grows a conscience. Might be a bomb, but willing to give him the benefit of the doubt.
  • Hancock. Will Smith goes superhero with an attitude, is (apparently) also a bum. What can possibly go wrong?
  • Hellboy 2: The Golden Army. After the spectacularly good Hellboy, I hope this one doesn’t succumb to sequel-syndrome.
  • Wanted. Angelina Jolie and Morgan Freeman teach some dude that he can curve the trajectory of bullets. A premise this insane has to lead to a fun movie.

MS 1, blogsphere 0

Microsoft 1, blogsphere 0. Heh. Sure.

why no one likes internet statistics sites

Alexa just changed their ranking system, Techcrunch has details. (The Alexa page, in typical clueless fashion, is not a permalink, so I won’t bother linking to it). This is good news, but it still doesn’t fix the problem. Why?

Alexa, Compete, Quantcast, Comscore, all have measures of “Rankings” or “Popularity” that put sites relative to others. These rankings, in general are a more-or-less accurate relative description of how a site is doing.

The keyword here, though, is relative, and, more specifically, relative to how the service in question measures other sites. Comparing a Quantcast ranking to a Compete ranking to an Alexa ranking for any given site is useless, and no one even attempts that.

Rankings for each site are widely understood to be relative within its own index, and no one has a problem with that. So far so good.

The real problem start when we put their measures of Visitors (Alexa calls that “Reach” as far as I know) and Visits. So some news publication may take a comscore measurement and say “such and such a site has 1,000,000 visits”, inevitably prompting discussion of whether it’s true or not and (generally) silent fuming on the part of the site that sees in their logs, every day, different data.

This is the problem. The use of a heavily overloaded term like “Visitors” or “Visits” or “Pageviews” or even “Reach” causes the confusion. None of these sites claim that they have the ultimate truth at their disposal, but by using common terms, this is exactly what happens. After all, everyone knows what a ‘Visitor’ is, right? Well, maybe. Maybe everyone does know what a Visitor or a Visit is, but no one agrees on the definition.

Even if everyone did agree on the definition, you could still lose data unless you’re looking at the logfiles for a service. Why? Several reasons, but the three key ones are:

  1. The services extrapolate traffic based on measures they choose. Try a site that has low traffic, and the services give up. “Not enough data.” Yep.
  2. Not only that, the services extrapolate based on previously filtered data. The raw log data for a year for any of the top 1,000 sites can be counted in the petabytes, if not exabytes. None of these services have enough compute power or storage at their disposal to process that, clearly. So they are pre-filtering information, which is then extrapolated. The prefiltering presumably eliminates bots. What if a new bot shows up? How do they count it?
  3. Domain mapping. Consider Ning. Suppose we agreed on what a pageview or a visit is. Would Alexa or Compete get the right numbers? No. Because they key on domains to define what is traffic to a service. In our case (as in many others) the service allows you to do domain mapping of your site, which means the services think that the traffic is going somewhere else, even though it’s going to Ning. That’s why I can assert without hesitation that the visits/visitors traffic reported by these services doesn’t cover a good portion of the traffic Ning handles, and when on top of that you add uncertainty as to what is a visit, and what is a visitor (is it an IP? a cookie? A combination? What about internet cafes? and so on), then the actual absolute value for those things is pretty much meaningless.

Now, what the services do track correctly in a lot of cases is trends, especially over several months (again, in the case of Ning the fact that they miss domain-mapped networks is a big problem, but the data we have for non-domain mapped networks shows similar, and I say again, similar shapes and trajectories).

My take is that if these services stopped calling what they measure “visits” or “visitors” and just said these were some sort of generic “[servicename] traffic measure” or something, then they would get a lot more respect, something they deserve since they do provide a valuable service, and they should get credit for that.

mowser ends

Nothing like some breaking news to awake you from a blog slumber. Russ announced on his blog the end of Mowser.
Sad news, and I sympathize. Russ and Mike are buddies so I’ve followed the travails of Mowser closer than, well, almost anybody, so this also is a bit more personal for me. I’ve been there. My previous startup, clevercactus, also ran out of money after having put everything I had into it, and I experienced something similar (although a bit less drastic) in terms of financial impact. A lot of us are used to abundance and generally have absolutely no idea how stressful it is (to put it mildly) to have to choose what food to buy to avoid breaking the bank.

The good news in all this is that failure is a HUGE learning opportunity, something that isn’t said enough. Throughout the process you’re consumed by trying to make it work, but once it’s done you can look back and find a lot of things to do differently in the future. Yes, in the future — I don’t believe that you can really say that you’d ‘do things differently’ since generally we make the best decisions we can with the information we have available at the time. Additionally, the next step (after a period of recovery) after having put everything into something that didn’t work can actually be very refreshing and lead to amazing opportunities, like it did with me.

There is also some solace to be found in the online response. When you work with a team of a few people or more you can help each other, but when you work on your own or with a partner it’s a harder situation, and the online response helps a lot. In the case of mowser, it’s been at the top of techmeme for a while now and it’s been covered and discussed all over the place, in part because of Russ’ statement that ‘the mobile web is dead.’ (More on that later), but also due to comments of support.

As for Mowser, it’s online for the time being and as Mike says they’re looking to sell the site or code, which is in my mind a very possible outcome. It would be cool to see it live on in some form.

%d bloggers like this: