Archive for the ‘ Tools ’ Category

Javascript performance: callback (async) vs Q ..

Promises/A+ Performance Hits You Should Be Aware Of

The Promises/A+ specification is a fresh and very interesting way of dealing with the asynchronous nature of Javascript. It also provides a sensible way to deal with error handling and exceptions. In this article we will go through the performance hits you should be aware of and as a side-effect do a comparison between the two most popular Promises/A+ implementations, When and Q and how they compare to Async, the lowest abstraction you can get on asynchronicity.


Basic nodejs single thread architecture:


The Case

My motivation for looking deeper into the performance of Promises/A+ was a Job Queuing system i’ve been working on named Kickq. It is expected that the system will get hammered when used on production so stress testing was warranted. After stubbing all the database interactions, essentially making the operation of job creation synchronous, I was getting odd performance results.

The test was simple, create 500 jobs in a loop and measure how long it takes for all the jobs to finish.

The measurements were in the ~550ms range and my eyeballs started to roll. “That’s a synchronous operation, it should finish in less than 3ms, WHAT THE????!?!”. After taking a few moments to let it sip in the suspect was found, it was Promises. I used them as the only pattern to handle asynchronous ops and callbacks throughout the whole project. Brian Cavalier, one of the authors of When.js, helped me pinpoint the real culprit, it was the tick:

Promises/A+ Specification, Note 4.1 In practical terms, an implementation must use a mechanism such as setTimeout, setImmediate, or process.nextTick to ensure that onFulfilled and onRejected are not invoked in the same turn of the event loop as the call to then to which they are passed.

In other words, Promises, per the specification, must be resolved Asynchronously! That comes with a cost, a heavy one apparently.

In the process of studying performance I had to create a performance library, poor mans profiling. And a benchmark test for Promises/A+ implementations that’s already used to optimize the future versions of When.


Creating The Promises/A+ Benchmark

I tried to broaden the definition of the test case. If an application uses the Promises pattern as the only way to manage how the internal parts interact, we can make a few assumptions:

  • There will be a series of promises chained together, representing the various operations that will be performed by your application.
  • The Deferred Object is used on each link of the chain to control resolution and how the promise object is exposed.
  • Throughout the whole chain of promises there can be operations that are actually synchronous, we will measure all cases.

Promises, Total Time to Resolve, 500 Loops

Promises, Memory Consumption

Difference to First Resolved Promise, 500 Loops

Perf Type Async When 2.1.0 Q 0.9.5 Promise 3.0.1
Sync Diff 0.01ms 36.62ms 186.43ms 63.96ms
Mixed Diff 5.37ms 41.78ms 226.34ms 83.83ms
Async Diff 22.42ms 58.18ms 241.80ms 93.68ms
Sync Diff vs AsyncLib 1x 3,662x 18,643x 6,396x
Mixed Diff vs AsyncLib 1x 7.78x 42.15x 15.61x
Async Diff vs AsyncLib 1x 2.60x 10.79x 4.18x

Libraries When.js v1.8.1 and Deferred are not included in this table because they resolve promises synchronously. This difference makes the Diff metric inapplicable.

Total Time of execution, 500 Loops

Perf Type Async When 1.8.1 When 2.1.0 Q 0.9.5 Deferred 0.6.3 Promise 3.0.1
Sync Total 5.15ms 12.35ms 72.35ms 301.47ms 71.25ms 80.50ms
Mixed Total 18.94ms 40.57ms 80.21ms 325.49ms 94.58ms 95.67ms
Async Total 35.70ms 50.63ms 90.52ms 337.82ms 105.87ms 107.01ms
Sync Total vs AsyncLib 1x 2.40x 14.05x 58.54x 13.83x 15.63x
Mixed Total vs AsyncLib 1x 2.14x 4.23x 17.19x 4.99x 5.05x
Async Total vs AsyncLib 1x 1.42x 2.54x 9.46x 2.97x 3.00x

Average Memory Difference – Single 500 Loop Runs

Pert Type Async When 1.8.1 When 2.0.1 When 2.1.x Q Q longStack=0 Deferred
Sync 113.29% 160.98% 840.21% 866.88% 1106.67% 684.56% 354.07%
Async 159.29% 458.44% 811.32% 834.63% 1110.21% 691.41% 429.18%



How to Cluster Magento, nginx and MySQL on Multiple Servers for High Availability

Magento is an open-source e-commerce platform built on Zend PHP and MySQL. It is widely adopted by online retailers with some 150,000 sites known to use it. Single server setups are easy to set up, but if your store is a huge success, then you probably need to think about clustering your environment with multiple servers. Clustering is done at the web, database and file-system level, as all web nodes need access to catalog images.


This post is similar to our previous posts on scaling Drupal and WordPress performance, and focuses on how to scale Magento on multiple servers. The software used is Magento version , nginx, HAProxy, MySQL Galera Cluster and OCFS2 (Oracle Cluster File System) with a shared storage using Ubuntu 12.04.2 LTS (Precise) 64bit.

Our setup consists of 6 nodes or servers:

  • NODE1: web server + database server
  • NODE2: web server + database server
  • NODE3: web server + database server
  • LB1: load balancer (master) + keepalived
  • LB2: load balancer (backup) + keepalived
  • ST1: shared storage + ClusterControl


We will be using OCFS2, a shared disk file system to serve the web files across our web servers. Each of these web servers will have a nginx web server colocated with a MySQL Galera Cluster instance. We will be using 2 other nodes for load balancing.

Our major steps would be:

  1. Prepare 6 instances
  2. Deploy MySQL Galera Cluster onto NODE1, NODE2 and NODE3 from ST1
  3. Configure iSCSI target on ST1
  4. Configure OCFS2 and mount the shared disk onto NODE1, NODE2 and NODE3
  5. Configure nginx on NODE1, NODE2 and NODE3
  6. Configure Keepalived and HAProxy for web and database load balancing with auto failover
  7. Install Magento and connect it to the Web/DB cluster via the load balancer


Prepare Hosts


Add following hosts definition in /etc/hosts: #virtual IP	NODE1 web1 db1	NODE2 web2 db2	NODE3 web3 db3	LB1	LB2	ST1 clustercontrol


Turn off sudo with password:

$ sudo visudo


And append following line:



Deploy MySQL Galera Cluster


** The deployment of the database cluster will be done from ST1


1. To set up MySQL Galera Cluster, go to the Galera Configurator to generate a deployment package. In the wizard, we used the following values when configuring our database cluster:

  • Vendor: Codership (based on MySQL 5.5)
  • Infrastructure: none/on-premises
  • Operating System: Ubuntu 12.04
  • Number of Galera Servers: 3+1
  • OS user: ubuntu
  • ClusterControl Server:
  • Database Servers:

At the end of the wizard, a deployment package will be generated and emailed to you.


2. Download the deployment package and run

$ wget
$ tar xvfz s9s-galera-codership-2.4.0.tar.gz
$ cd s9s-galera-codership-2.4.0/mysql/scripts/install
$ bash ./ 2>&1 | tee cc.log


3. The deployment takes about 15 minutes, and once it is completed, note your API key. Use it to register the cluster with the ClusterControl UI by going to . You will now see your MySQL Galera Cluster in the UI.


Configure iSCSI


1. The storage server (ST1) needs to export a disk through iSCSI so it can be mounted on all three web servers (NODE1, NODE2 and NODE3). iSCSI basically tells your kernel you have a SCSI disk, and it transports that access over IP. The “server” is called the “target” and the “client” that uses that iSCSI device is the “initiator”.

Install iSCSI target in ST1:

$ sudo apt-get install -y iscsitarget iscsitarget-dkms


2. Enable iscsitarget:

$ sudo sed -i "s|false|true|g" /etc/default/iscsitarget


3. It is preferred to have separate disk for this file system clustering purpose. So we are going to use another disk mounted in ST1 (/dev/sdb) to be shared among web server nodes. Define this in iSCSI target configuration file:

$ vim /etc/iet/ietd.conf


And add following lines:

Target iqn.2013-06.ST1:ocfs2
        Lun 0 Path=/dev/sdb,Type=fileio
        Alias iscsi_ocfs2


4. Add NODE1, NODE2 and NODE3 by specifying the network into iSCSI allow list:

$ vim /etc/iet/initiators.allow


And append following line:



5. Start iSCSI target service:

$ sudo service iscsitarget start


** The following steps should be performed on NODE1, NODE2 and NODE3


6. Install iSCSI initiator on respective hosts:

$ sudo apt-get install -y open-iscsi


7. Set the iSCSI initiator to automatically start and restart the iSCSI initiator service to apply changes:

$ sudo sed -i "s|^node.startup.*|node.startup = automatic|g" /etc/iscsi/iscsid.conf
$ sudo service open-iscsi restart


8. Discover iSCSI targets that we have setup earlier:

$ sudo iscsiadm -m discovery -t sendtargets -p ST1,1 iqn.2013-06.ST1:ocfs2


9. If you see some result as above, means we can see and able to connect to the iSCSI target. We just need to do another restart to access the iSCSI target:

$ sudo service open-iscsi restart


10. Make sure you can see the new hard disk (/dev/sdb) listed under /dev directory:

$ ls -1 /dev/sd*


Configure OCFS2


** The following steps should be performed on NODE1 unless specified.


1. OCFS2 allows for file system to be mounted more than one place. Install OCFS2 tools in NODE1, NODE2 and NODE3:

$ sudo apt-get install -y ocfs2-tools


2. Create disk partition table for hard disk drive /dev/sdb:

$ sudo cfdisk /dev/sdb


Create a partition by using following sequences in the wizard: New > Primary > accept Size > Write > yes


3. Creates an OCFS2 file system on /dev/sdb1:

$ sudo mkfs.ocfs2 -b 4K -C 128K -L "Magento_Cluster" /dev/sdb1


4. Create cluster configuration file and define the node and cluster directives:

# /etc/ocfs2/cluster.conf
        node_count = 3
        name = ocfs2
        ip_port = 7777
        ip_address =
        number = 1
        name = NODE1
        cluster = ocfs2
        ip_port = 7777
        ip_address =
        number = 2
        name = NODE2
        cluster = ocfs2
        ip_port = 7777
        ip_address =
        number = 3
        name = NODE3
        cluster = ocfs2

*Notes: The attributes under the node or cluster clause need to be after a tab.


** The following steps should be performed on NODE1, NODE2 and NODE3 unless specified.


5. Create the same configuration file (/etc/ocfs2/cluster.conf) in NODE2 and NODE3. This file should be the same on all nodes in the cluster, and changes to this file must be propagated to the other nodes in the cluster.


6. Enable o2cb driver to load the driver on boot on all nodes:

$ sudo sed -i "s|false|true|g" /etc/default/o2cb


7. Restart iSCSI initiator to update the newly created disk partition:

$ sudo service open-iscsi restart


8. Restart o2cb service to apply the changes in /etc/ocfs2/cluster.conf:

$ sudo service o2cb restart


9. Create the web files directory under /var/www:

$ sudo mkdir -p /var/www/magento


10. Get the block ID for the /dev/sdb1. UUID is recommended in fstab if you use iSCSI device:

$ sudo blkid /dev/sdb1 | awk {'print $3'}


11. Add following line into /etc/fstab:

UUID=82b1d98c-30e7-4ade-ab9b-590f857797fd		/var/www/magento     ocfs2   defaults,_netdev        0 0


12. Mount the filesystem:

$ sudo mount -a


13. In NODE1, uncompress and copy Magento web files into /var/www/magento and setup directory permission:

$ tar -xzf magento-
$ sudo cp -Rf magento/* /var/www/magento
$ sudo chown -R www-data.www-data /var/www/magento
$ sudo chmod 777 /var/www/magento/app/etc
$ sudo chmod 777 -Rf /var/www/magento/var
$ sudo chmod 777 -Rf /var/www/magento/media


Configure nginx and PHP-FPM


** The following steps should be performed on NODE1, NODE2 and NODE3.


1. We will use nginx as the web server for Magento. Install nginx and all required PHP modules:

$ sudo apt-get install nginx php5-common php5-cli php5-fpm php5-mysql php5-mcrypt php5-gd php5-curl php-soap


2. Open nginx virtual host configuration file at /etc/nginx/sites-available/default and add following lines:

# /etc/nginx/sites-available/magento
server {
    root /var/www/magento;
    location / {
        index index.html index.php;
        try_files $uri $uri/ @handler;
        expires 30d;
    location /app/                { deny all; }
    location /includes/           { deny all; }
    location /lib/                { deny all; }
    location /media/downloadable/ { deny all; }
    location /pkginfo/            { deny all; }
    location /report/config.xml   { deny all; }
    location /var/                { deny all; }
    location ~ /. {
        deny all;
        access_log off;
        log_not_found off;
    location @handler {
        rewrite / /index.php;
    location ~ .php/ {
        rewrite ^(.*.php)/ $1 last;
    location ~ .php$ {
        if (!-e $request_filename) { rewrite / /index.php last; }
        expires        off;
        fastcgi_param  SCRIPT_FILENAME  $document_root$fastcgi_script_name;
        fastcgi_param  MAGE_RUN_CODE default;
        fastcgi_param  MAGE_RUN_TYPE store;
        include        fastcgi_params;


3. Create a symbolic link from sites-available directory to enable the magento virtual host:

$ cd /etc/nginx/sites-enabled
$ sudo ln -s /etc/nginx/sites-available/magento magento


4. Restart nginx and PHP:

$ sudo service php5-fpm restart
$ sudo service nginx restart


Load Balancer and Failover


Instead of using HAProxy for doing SQL load balancing, we will be using some of the suggestions based on this article and just have the Magento instances connect to their local MySQL Server using localhost, with following criteria:

    • Magento in each node will connect to MySQL database using localhost and bypassing HAProxy.
    • Load balancing on database layer is only for mysql client/console. HAProxy will be used to balance HTTP.
    • Keepalived will be used to hold the virtual IP: on load balancers LB1 and LB2

In case you plan to place the MySQL Servers on separate servers, then the Magento instances should connect to the database cluster via the HAProxy.


** The following steps should be performed on ST1


1. We have created scripts to install HAProxy and Keepalived, these can be obtained from our Git repository.

Install git and clone the repo:

$ apt-get install -y git
$ git clone


2. Make sure LB1 and LB2 are accessible using passwordless SSH. Copy the SSH keys to LB1 and LB2:

$ ssh-copy-id -i ~/.ssh/id_rsa
$ ssh-copy-id -i ~/.ssh/id_rsa


3. Install HAProxy on both nodes:

$ cd s9s-admin/cluster/
$ sudo ./s9s_haproxy --install -i 1 -h
$ sudo ./s9s_haproxy --install -i 1 -h


4. Install Keepalived on LB1 (master) and LB2 (backup) with as virtual IP:

$ sudo ./s9s_haproxy --install-keepalived -i 1 -x -y -v


** The following steps should be performed on LB1 and LB2


5. By default, the script will configure the MySQL reverse proxy service to listen on port 33306. We will need to add a few more lines to tell HAproxy to load balance our web server farm as well. Add following line in /etc/haproxy/haproxy.cfg:

frontend http-in
    bind *:80
    default_backend web_farm
backend web_farm
    server NODE1 maxconn 32
    server NODE2 maxconn 32
    server NODE3 maxconn 32


6. Restart HAProxy service:

$ sudo killall haproxy
$ sudo /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/ -st `cat /var/run/`


Install Magento


1. Now that we have a load-balanced setup that is ready to support Magento, we will now create our Magento database. From the ClusterControl UI, go to Manage > Schema and Users > Create Database to create the database:


2. Create the database user under Privileges tab:


3. Assign the correct privileges for magento_user on database magento_site:

At the moment, we assume you have pointed and to the virtual IP,


4. Open web browser and go to You should see an installation page similar to screenshot below:


* Take note that we are using localhost in the host value, session data will be saved in database. It will allow users to use the same session regardless of which web server they are connected to.




** Updated on 9th Dec 2013 **

By default Magento will setup a MyISAM table specifically for FULLTEXT indexing called catalogsearch_fulltext. MyISAM tables are supported within MySQL Galera Cluster, however, MyISAM has only basic support, primarily because the storage engine is non-transactional and so Galera cannot guarantee the data will remain consistent within the cluster.

Codership has released MySQL-wsrep 5.6 supports with Galera 3.0 which currently in beta release at the time of this update. You could either use the MySQL-wsrep 5.6 which supports InnoDB FTS or convert all non-Galera friendly tables to use InnoDB with primary keys. Alternatively, you can use external search engine (such as Solr or Sphinx) for FTS capabilities.

If you choose the latter option, you need to convert some of the tables to work well with Galera by executing following queries on one of the DB node:

mysql> ALTER TABLE magento.widget_instance_page_layout ADD id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY FIRST;
mysql> ALTER TABLE magento.catalogsearch_fulltext ENGINE='InnoDB';



Verify The Architecture


1. Check the HAproxy statistics by logging into the HAProxy admin page at LB1 host port 9600. The default username/password is admin/admin. You should see some bytes in and out on the web_farm ands9s_33306_production sections:


2. Check and observe the traffic on your database cluster from the ClusterControl overview page athttps:// :


There are many improvements that could be made to this setup. For example, you could provide redundancy to the shared storage server by installing DRBD. You can also add a Varnish Cache in the load balancing servers to provide better caching on your static contents and reduce the load on the web servers/database servers.


via How to Cluster Magento, nginx and MySQL on Multiple Servers for High Availability | Severalnines.

Because Hadoop isn’t perfect: 8 ways to replace HDFS


Hadoop is on its way to becomig the de facto platform for the next-generation of data-based applications, but it’s not without some flaws. Ironically, one of Hadoop’s biggest shortcomings right now is also one of its biggest strengths going forward — the Hadoop Distributed File System.

Hadoop is on its way to becoming the de facto platform for the next-generation of data-based applications, but it’s not without flaws. Ironically, one of Hadoop’s biggest shortcomings now is also one of its biggest strengths going forward — the Hadoop Distributed File System.

Within the Apache Software Foundation, HDFS is always improving in terms of performance and availability. Honestly, it’s probably fine for the majority of Hadoop workloads that are running in pilot projects, skunkworks projects or generally non-demanding environments. And technologies such as HBase that are built atop HDFS speak to its versatility as storage system even for non-MapReduce applications.

But if the growing number of options for replacing HDFS signifies anything, it’s that HDFS isn’t quite where it needs to be. Some Hadoop users have strict demands around performance, availability and enterprise-grade features, while others aren’t keen of its direct-attached storage (DAS) architecture. Concerns around availability might be especially valid for anyone (read “almost everyone”) who’s using an older version of Hadoop without the High Availability NameNode. Here are eight products and projects whose proprietors argue can deliver what HDFS can’t:


Cassandra (DataStax)


Not a file system at all but an open source, NoSQL key-value store, Cassandra has become a viable alternative to HDFS for web applications that rely on fast data access. DataStax, a startup commercializing the Cassandra database, has fused Hadoop atop Cassandra to provide web applications fast access to data processed by Hadoop, and Hadoop fast access to data streaming into Cassandra from web users.



Ceph is an open source, multi-pronged storage system that was recently  commercialized by a startup called Inktank. Among its features is a high-performance parallel file system that some think makes it a candidate for replacing HDFS (and then some) in Hadoop environments. Indeed, some researchers started looking at this possibility as far back as 2010.


Dispersed Storage Network (Cleversafe)


Cleversafe got into the HDFS-replacement business on Monday, announcing a product that will fuse Hadoop MapReduce with the company’s Dispersed Storage Network system. By fully distributing metadata across the cluster (instead of relying on a single NameNode) and not relying on replication, Cleversafe says it’s much faster, more reliable and scalable than HDFS.




IBM has been selling its General Parallel File System to high-performance computing customers for years (including within some of the world’s fastest supercomputers), and in 2010 it tuned GPFS for Hadoop. IBM claims the GPFS-SNC (Shared Nothing Cluster) edition is so much faster than Hadoop in part because it runs at the kernel level as opposed to atop the OS like HDFS.

Isilon (EMC)


EMC has offered its own Hadoop distributions for more than a year, but in January 2012 it unveiled a new method for making HDFS enterprise-class — replace it with EMC Isilon’s OneFS file system. Technically, as EMC’s Chuck Hollis explained at the time, because Isilon can read NFS, CIFS and HDFS protocols, a single Isilon NAS system can serve to intake, process and analyze data.



Lustre is a an open source high-performance file system that some claim can make for an HDFS alternative where performance is a major concern. Truth be told, I haven’t heard of this combination running anywhere in the wild, but HPC storage provider Xyratex wrote a paper on the combination in 2011, claiming a Lustre-based cluster (even with InfiniBand) will be faster and cheaper than an HDFS-based cluster.

MapR File System



The MapR File System is probably the best-known HDFS alternative, as it’s the basis of MapR’s increasingly popular — and well-funded — Hadoop distribution. Not only does MapR claim its file system is two to five times faster than HDFS on average (although, really, up to 20 times faster), but it has features such as mirroring, snapshots and high availability that enterprise customers love.

NetApp Open Solution for Hadoop




OK, the NetApp Open Solution for Hadoop isn’t so much an HDFS replacement as it is an HDFS improvement, according to NetApp and early partner Cloudera. The offering still relies on HDFS, but it reenvisions the physical Hadoop architecture by putting HDFS on a RAID array. This, NetApp claims, means faster, more reliable and more secure Hadoop jobs.

This might be a good place to say rest in peace to two other HDFS alternatives that are effectively no longer with us — KosmosFS (aka CloudStore) and Appistry CloudIQ Storage. The former was created by Kosmix (since bought by @WalmartLabs) and released to the open source world in 2007, but no longer has an active community. The latter was an attempt by Appistry in 2010 to get a piece of the Hadoop pie with its computational storage technology, but the company has since switched its focus from selling the technology to providing high-performance computing services based on it.

via Because Hadoop isn’t perfect: 8 ways to replace HDFS — Tech News and Analysis.