Sunday, 20 January 2008

Crunching data on the cheap

This is one of the more interesting uses for the Amazon S3+EC2 infrastructure I've seen; converting 11 million archive New York Times articles from scanned TIFF files to PDF in under 24 hours.

Wow. Super computing with no academic infrastructure, no Crays, and no months-of-planning. Oh, and batch processing without the Sun Grid (read the comments here, to find out what the Sun Grid is not, and why it should've been great for this kind of app while it's not really purposed for web-apps)

This tiff-to-pdf conversion job shows two trends that we will be seeing a lot more of. Firstly, a shift to hyper-concurrent processing as the default, and secondly, the rapid provisioning and deprovisioning of virtual machines to perform specific jobs. When crunching large data sets for specialised jobs it's often the case that significant infrastructure is needed for small bursts of time. This example shows clearly how getting hold of that infrastracture is now operational expenditure, not capital expenditure, and any CFO who hasn't woken up and worked that out yet, and started telling the IT boys they are probably decreasing shareholder value every time they buy a server, needs to smell the coffee.

There's a whole cottage industry springing up around the Amazon services as folk take those utility tools, and built specific applications on top of them. I wonder if Amazon is watching that space, and if it will start acquiring companies who provide domain specific autonomics on top of their infrastructure? My guess is that within as short a space as 12 months people will rarely be buying services direct from Amazon, but rather from a value-add-reseller who provides something directly usable out of the box, and as that happens, will Amazon want to get closer to the money? Especially as other utility compute providers enter the market and there is consolidation and normalization in the "storage + virtual machine on demand" area. I don't know the answer to that one, gonna be interesting to watch.

In the international calling minutes world there is so much similarity between the different termination providers that minutes are traded electronically. After all, a phone call is a phone call is a phone call. Your call with your provider may take a different route today to the one it takes tomorrow, and this is known as Least Cost Routing, and looks a lot like most other electronic trading systems when you dig deep enough.

I wonder if the same will apply with utility computing? Least Cost MIPS?

No comments:

Tim Stevens

Tim Stevens
Be Silent