Archive for January, 2011

Google App Engine GAE vs Amazon Elastic Computing EC2 vs Microsoft Azure

Google App Engine GAE vs Amazon Elastic Computing EC2 vs Microsoft Azure.

Almost a year ago, I compared Google App Engine and Microsoft Windows Azure, trying to decide which platform I should write and host my blog (and some other small projects) on. The comparison was about more than hosting – the languages and frameworks used would be influenced by the platform I was hosting on. There were also APIs available only to one platform, or easier to use on one platform compared to the other (such as the App Engine authentication).

Due to the huge differences, I did a little homework on each platform, and ultimately, it came down to price. The difference in pricing between Google App Engine and Windows Azure was so enormous, that there wasn’t really a decision to make. App Engine hosts this blog for free. Windows Azure would’ve cost around $100/month minimum.

One Year On

Fast forward a year and things have changed a little. App Engine has become more mature, Amazon has introduced Micro instances, and Microsoft has done, well, not a lot. There’s been seemingly zero change in the pricing for Windows Azure, meaning there’s still a significant minimum cost in using it.

Windows Azure vs Amazon EC2 vs Google App Engine – Stack Overflow

Reasons to use GAE:

  • You don’t pay until your app grows quite a bit. With Azure, you pay almost $100 each month, even if you don’t have a single website visitor. If your db goes over 1GB, you pay an extra $90 ($9->$99) for storage.
  • GAE’s payment is also very fine-grained – only for the resources you use. Azure (and AWS) is “blocky” – you pay something for each server instance you’ll run (plus resources), irrespective of whether it gets any use at all.
  • GAE has the lightest admin load. Once you’re setup, deploying and re-deploying is quick and they’ll auto-everything. For example, you don’t worry about how many servers your app is using, how to shard the data, how to load-balance.
  • Mail just works. At the time of writing, Azure doesn’t offer SMTP out so you need a 3rd party server.
  • Great integration with many of the Google offerings – calendars, mail, whatever. You can delegate user management to Google if you don’t want control over your user base.
  • With GAE you know any features they add to the store, you’ll get. With Azure, you get the feeling Sql Azure Database will get most of the love but it’ll be more expensive. Azure Storage is likely to have the most gotchas. No relational integrity, no order-by, you’ll fiddle with the in-memory context more. GAE’s store has far fewer restrictions and more features than Azure Tables.
  • Good choice if you’re using Python or JVM-based languages already. Many languages compile to Java bytecode nowadays.
  • Updating the app is very fast. For Python, I had a shortcut key setup and it took no time at all. I now use the Eclipse Plugin for Java and it works very well. Azure is more fiddly.
  • A locally tested app will probably run on the cloud without (much or any) changes. With Azure, the config is different and I spent some time stopping-deleting-building-uploading-starting before I got it right.
  • GAE has a great UI that includes a log viewer a data editor. With Azure, you currently have to find external viewers/editors for this.
  • GAE lets you have multiple versions of your application running on the same datastore. You can deploy, test a version and then set the current ‘live’ version when you’re ready. You can change back if something goes wrong.
  • Reasons to use Azure:

  • I’ve read of users getting “I think you’re a robot” messages, like sometimes pop up on other Google properties. I haven’t seen it, but it would alarm me.
  • Azure seems to be better designed if you have a SOA-type approach. Their architectures seem to benefit from experience in the enterprise world. GAE seems more focused on simply serving web pages.
  • You can run the app under debug, put in breakpoints, etc.
  • Azure has a “staging” environment where you can deploy to the cloud, but not make it live until you’re happy it works.
  • I’m using .Net for other things, and integrating them with .Net on the backend is much easier than with GAE. (Update – using Java on GAE works fine, and the 10-second timeout is now 30 seconds).
  • Azure has two approaches to storage, offering more choice. They are SQL Azure Database (SAD) which is a relational DB, and Azure Storage, which consists of non-relational tables, blobs and queues. If you have an investment in SQL Server then SAD will be easy to move to, but is quite costly and might be less scalable.
  • Integration with many MS “Live” offerings.So, no obvious answers. I’m defaulting to App Engine at the moment because of costs and ease of use. I might use Azure for very MS-oriented apps. I use Amazon S3 for downloads but likely won’t use EC2 because I prefer leaving everything under the application level to the experts.
  • Open source web-crawler: Web-Harvest

    Web-Harvest Project Home Page.

    Web-Harvest is Open Source Web Data Extraction tool written in Java. It offers a way to collect desired Web pages and extract useful data from them. In order to do that, it leverages well established techniques and technologies for text/xml manipulation such as XSLT, XQuery and Regular Expressions. Web-Harvest mainly focuses on HTML/XML based web sites which still make vast majority of the Web content. On the other hand, it could be easily supplemented by custom Java libraries in order to augment its extraction capabilities.

    Process of extracting data from Web pages is also referred as Web Scraping or Web Data Mining. World Wide Web, as the largest database, often contains various data that we would like to consume for our needs. The problem is that this data is in most cases mixed together with formatting code – that way making human-friendly, but not machine-friendly content. Doing manual copy-paste is error prone, tedious and sometimes even impossible. Web software designers usually discuss how to make clean separation between content and style, using various frameworks and design patterns in order to achieve that. Anyway, some kind of merge occurs usually at the server side, so that the bunch of HTML is delivered to the web client.

    Map-Reduce With Ruby Using Hadoop

    High Scalability – High Scalability – Map-Reduce With Ruby Using Hadoop.

    Map-Reduce With Hadoop Using Ruby

    A demonstration, with repeatable steps, of how to quickly fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine. Below I am using my MacBook Pro as my local machine, but the steps I have provided should be reproducible on other platforms running bash and Java.

    Fire-Up Your Hadoop Cluster

    I choose the Cloudera distribution of Hadoop which is still 100% Apache licensed, but has some additional benefits. One of these benefits is that it is released by Doug Cutting, who started Hadoop and drove it’s development at Yahoo! He also started Lucene, which is another of my favourite Apache Projects, so I have good faith that he knows what he is doing. Another benefit, as you will see, is that it is simple to fire-up a Hadoop cluster……