Archive for the ‘ Cloud ’ Category

Map-Reduce With Ruby Using Hadoop

High Scalability – High Scalability – Map-Reduce With Ruby Using Hadoop.

Map-Reduce With Hadoop Using Ruby

A demonstration, with repeatable steps, of how to quickly fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine. Below I am using my MacBook Pro as my local machine, but the steps I have provided should be reproducible on other platforms running bash and Java.

Fire-Up Your Hadoop Cluster

I choose the Cloudera distribution of Hadoop which is still 100% Apache licensed, but has some additional benefits. One of these benefits is that it is released by Doug Cutting, who started Hadoop and drove it’s development at Yahoo! He also started Lucene, which is another of my favourite Apache Projects, so I have good faith that he knows what he is doing. Another benefit, as you will see, is that it is simple to fire-up a Hadoop cluster……

Amazon Linux AMI – what distro

Re: Amazon Linux AMI – what distro is this based on?

The Amazon Linux AMI is based on RHEL 5.x and parts of RHEL6. One of our goals is binary compatibility with RHEL 5.x, and therefore CentOS5.x. Astute users will note that our kernel is based on 2.6.34, and we have engineered the image to conform to a cloud environment. For example, the lack of Xorg support helps to keep the images small and lean. The goal of the Amazon Linux AMI is to provide an image for use in the cloud and to serve as reference image of EC2 best practices. The maintenance (security, enhancements, features and bug fixes) for the image will come directly from Amazon, while maintaining maximum compatibility, security and functionality.

via Amazon Web Services Developer Community : Amazon Linux AMI – what distro is this ….