Archive for the ‘ Security ’ Category

Post-mortem Windows Azure

A curious software bug which caused a two day outrage of Windows Azur:


As a follow-up to my March 1 posting, I want to share the findings of our root cause analysis of the service disruption of February 29th. We know that many of our customers were impacted by this event and we want to be transparent about what happened, what issues we found, how we plan to address these issues, and how we are learning from the incident to prevent a similar occurrence in the future.

Again, we sincerely apologize for the disruption, downtime and inconvenience this incident has caused. We will be proactively issuing a service credit to our impacted customers as explained below. Rest assured that we are already hard at work using our learnings to improve Windows Azure.

Overview of Windows Azure and the Service Disruption

Windows Azure comprises many different services, including Compute, Storage, Networking and higher-level services like Service Bus and SQL Azure. This partial service outage impacted Windows Azure Compute and dependent services: Access Control Service (ACS), Windows Azure Service Bus, SQL Azure Portal, and Data Sync Services. It did not impact Windows Azure Storage or SQL Azure.

While the trigger for this incident was a specific software bug, Windows Azure consists of many components and there were other interactions with normal operations that complicated this disruption. There were two phases to this incident. The first phase was focused on the detection, response and fix of the initial software bug. The second phase was focused on the handful of clusters that were impacted due to unanticipated interactions with our normal servicing operations that were underway. Understanding the technical details of the issue requires some background on the functioning of some of the low-level Windows Azure components.

Fabric Controllers, Agents and Certificates

In Windows Azure, cloud applications consist of virtual machines running on physical servers in Microsoft datacenters. Servers are grouped into “clusters” of about 1000 that are each independently managed by a scaled-out and redundant platform software component called the Fabric Controller (FC), as depicted in Figure 1. Each FC manages the lifecycle of applications running in its cluster, provisions and monitors the health of the hardware under its control. It executes both autonomic operations, like reincarnating virtual machine instances on healthy servers when it determines that a server has failed, as well as application-management operations like deploying, updating and scaling out applications. Dividing the datacenter into clusters isolates faults at the FC level, preventing certain classes of errors from affecting servers beyond the cluster in which they occur.

Figure 1. Clusters and Fabric Controllers

Part of Windows Azure’s Platform as a Service (PaaS) functionality requires its tight integration with applications that run in VMs through the use of a “guest agent” (GA) that it deploys into the OS image used by the VMs, shown in Figure 2. Each server has a “host agent” (HA) that the FC leverages to deploy application secrets, like SSL certificates that an application includes in its package for securing HTTPS endpoints, as well as to “heart beat” with the GA to determine whether the VM is healthy or if the FC should take recovery actions.

Figure 2. Host Agent and Guest Agent Initialization

So that the application secrets, like certificates, are always encrypted when transmitted over the physical or logical networks, the GA creates a “transfer certificate” when it initializes. The first step the GA takes during the setup of its connection with the HA is to pass the HA the public key version of the transfer certificate. The HA can then encrypt secrets and because only the GA has the private key, only the GA in the target VM can decrypt those secrets.

There are several cases that require generation of a new transfer certificate. Most of the time that’s only when a new VM is created, which occurs when a user launches a new deployment, when a deployment scales out, or when a deployment updates its VM operating system. The fourth case is when the FC reincarnates a VM that was running on a server it has deemed unhealthy to a different server, a process the platform calls “service healing.”

The Leap Day Bug

When the GA creates the transfer certificate, it gives it a one year validity range. It uses midnight UST of the current day as the valid-from date and one year from that date as the valid-to date. The leap day bug is that the GA calculated the valid-to date by simply taking the current date and adding one to its year. That meant that any GA that tried to create a transfer certificate on leap day set a valid-to date of February 29, 2013, an invalid date that caused the certificate creation to fail.

As mentioned, transfer certificate creation is the first step of the GA initialization and is required before it will connect to the HA. When a GA fails to create its certificates, it terminates. The HA has a 25-minute timeout for hearing from the GA. When a GA doesn’t connect within that timeout, the HA reinitializes the VM’s OS and restarts it.

If a clean VM (one in which no customer code has executed) times out its GA connection three times in a row, the HA decides that a hardware problem must be the cause since the GA would otherwise have reported an error. The HA then reports to the FC that the server is faulty and the FC moves it to a state called Human Investigate (HI). As part of its standard autonomic failure recovery operations for a server in the HI state, the FC will service heal any VMs that were assigned to the failed server by reincarnating them to other servers. In a case like this, when the VMs are moved to available servers the leap day bug will reproduce during GA initialization, resulting in a cascade of servers that move to HI.

To prevent a cascading software bug from causing the outage of an entire cluster, the FC has an HI threshold, that when hit, essentially moves the whole cluster to a similar HI state. At that point the FC stops all internally initiated software updates and automatic service healing is disabled. This state, while degraded, gives operators the opportunity to take control and repair the problem before it progresses further.

The Leap Day Bug in Action

The leap day bug immediately triggered at 4:00PM PST, February 28th (00:00 UST February 29th) when GAs in new VMs tried to generate certificates. Storage clusters were not affected because they don’t run with a GA, but normal application deployment, scale-out and service healing would have resulted in new VM creation. At the same time many clusters were also in the midst of the rollout of a new version of the FC, HA and GA. That ensured that the bug would be hit immediately in those clusters and the server HI threshold hit precisely 75 minutes (3 times 25 minute timeout) later at 5:15PM PST. The bug worked its way more slowly through clusters that were not being updated, but the critical alarms on the updating clusters automatically stopped the updates and alerted operations staff to the problem. They in turn notified on-call FC developers, who researched the cause and at 6:38PM PST our developers identified the bug.

By this time some applications had single VMs offline and some also had multiple VMs offline, but most applications with multiple VMs maintained availability, albeit with some reduced capacity. To prevent customers from inadvertently causing further impact to their running applications, unsuccessfully scaling-out their applications, and fruitlessly trying to deploy new applications, we disabled service management functionality in all clusters worldwide at 6:55PM PST. This is the first time we’ve ever taken this step. Service management allows customers to deploy, update, stop and scale their applications but isn’t necessary for the continued operation of already deployed applications. However stopping service management prevents customers from modifying or updating their currently deployed applications.

We created a test and rollout plan for the updated GA by approximately 10:00PM PST, had the updated GA code ready at 11:20PM PST, and finished testing it in a test cluster at 1:50AM PST, February 29th. In parallel, we successfully tested the fix in production clusters on the VMs of several of our own applications. We next initiated rollout of the GA to one production cluster and that completed successfully at 2:11AM PST, at which time we pushed the fix to all clusters. As clusters were updated we restored service management functionality for them and at 5:23AM PST we announced service management had been restored to the majority of our clusters.

Secondary Outage ……

Read the rest of the story:

via Summary of Windows Azure Service Disruption on Feb 29th, 2012 – Windows Azure – Site Home – MSDN Blogs.

THC-HYDRA – fast and flexible network login hacker

THC-HYDRA – fast and flexible network login hacker.


A very fast network logon cracker which support many different services
state-of-the-art: hydra-6.1-src.tar.gz (IPv6 and new modules)
stable: hydra-5.9.1-src.tar.gz (just fixes)

Last update 2011-02-03

[0x00] News and Changelog

Hydra is now over 10 years old! Yeah!
Good news: hydra is now co-maintained by co-maintained by David Maciejak @ gmail (dot) com,
thanks a lot!
Hydra is made available under GPLv3 with a special OpenSSL license expansion.
No more windows .exe cygwin port. Too many clueless people hassled me why hydra.exe
does not work for them when they double-click on it … duh

Because v6 contains a lot of new code for IPv6, 5.9.x is kept and maintained further until v6 is stable.
It was tested to work on Linux, Windows/Cygwin, Solaris 11, FreeBSD 8.1 and OSX.

CHANGELOG for 6.1 (development and new features)
* More license updates for the files for the debian guys
* Fix for the configure script to correctly detect postgresql
* Add checks for libssh v0.4 and support for ssh v1
* Merge all latest crypto code in sasl files
* Fix SVN compilation issue on openSUSE (tested with v11.3)

CHANGELOG for 6.0 (development and new features)
* Added GPL exception clause to license to allow linking to OpenSSL – debian people need this
* IPv6 support finally added. Note: sip and socks5 modules do not support IPv6 yet
* Bugfix for SIP module, thanks to yori(at)counterhackchallenges(dot)com
* Compile fixes for systems without OpenSSL or old OpenSSL installations
* Eliminated compile time warnings
* xhydra updates to support the new features (david@)
* Added CRAM-MD5, DIGEST-MD5 auth mechanism to the smtp-auth module (david@)
* Added LOGIN, PLAIN, CRAM-(MD5,SHA1,SHA256) and DIGEST-MD5 auth mechanisms to the imap and pop3 modules (david@)
* Added APOP auth to POP3 module (david@)
* Added NTLM and DIGEST-MD5 to http-auth module and DIGEST-MD5 to http-proxy module (david@)
* Fixed VNC module for None and VLC auth (david@)
* Fixes for LDAP module (david@)
* Bugfix Telnet module linemode option negotiation using win7 (david@)
* Bugfix SSH module when max auth connection is reached (david@)

CHANGELOG for 5.9.1 (stable)
* Fixes for SSH, VNC and LDAP (all by david@)

Have fun!

Some dictionaries are listed here:

Hack Weak Passwords

How I’d Hack Your Weak Passwords.

How I’d Hack Your Weak Passwords

by John P.

User LoginIf you invited me to try and crack your password, you know the one that you use over and over for like every web page you visit, how many guesses would it take before I got it?

Let’s see… here is my top 10 list. I can obtain most of this information much easier than you think, then I might just be able to get into your e-mail, computer, or online banking. After all, if I get into one I’ll probably get into all of them.

  1. Your partner, child, or pet’s name, possibly followed by a 0 or 1 (because they’re always making you use a number, aren’t they?)
  2. The last 4 digits of your social security number.
  3. 123 or 1234 or 123456.
  4. “password”
  5. Your city, or college, football team name.
  6. Date of birth – yours, your partner’s or your child’s.
  7. “god”
  8. “letmein”
  9. “money”
  10. “love”

Statistically speaking that should probably cover about 20% of you. But don’t worry. If I didn’t get it yet it will probably only take a few more minutes before I do…

Hackers, and I’m not talking about the ethical kind, have developed a whole range of tools to get at your personal data. And the main impediment standing between your information remaining safe, or leaking out, is the password you choose. (Ironically, the best protection people have is usually the one they take least seriously.)

One of the simplest ways to gain access to your information is through the use of a Brute Force Attack. This is accomplished when a hacker uses a specially written piece of software to attempt to log into a site using your credentials. has a list of the Top 10 FREE Password Crackers right here.

So, how would one use this process to actually breach your personal security? Simple. Follow my logic:

  • You probably use the same password for lots of stuff right?
  • Some sites you access such as your Bank or work VPN probably have pretty decent security, so I’m not going to attack them.
  • However, other sites like the Hallmark e-mail greeting cards site, an online forum you frequent, or an e-commerce site you’ve shopped at might not be as well prepared. So those are the ones I’d work on.
  • So, all we have to do now is unleash Brutus, wwwhack, or THC Hydra on their server with instructions to try say 10,000 (or 100,000 – whatever makes you happy) different usernames and passwords as fast as possible.
  • Once we’ve got several login+password pairings we can then go back and test them on targeted sites.
  • But wait… How do I know which bank you use and what your login ID is for the sites you frequent? All those cookies are simply stored, unencrypted and nicely named, in your Web browser’s cache. (Read this post to remedy that problem.)

And how fast could this be done? Well, that depends on three main things, the length and complexity of your password, the speed of the hacker’s computer, and the speed of the hacker’s Internet connection.

Assuming the hacker has a reasonably fast connection and PC here is an estimate of the amount of time it would take to generate every possible combination of passwords for a given number of characters. After generating the list it’s just a matter of time before the computer runs through all the possibilities – or gets shut down trying.

Pay particular attention to the difference between using only lowercase characters and using all possible characters (uppercase, lowercase, and special characters – like @#$%^&*). Adding just one capital letter and one asterisk would change the processing time for an 8 character password from 2.4 days to 2.1 centuries.

Password Length All Characters Only Lowercase
3 characters
4 characters
5 characters
6 characters
7 characters
8 characters
9 characters
10 characters
11 characters
12 characters
13 characters
14 characters
0.86 seconds
1.36 minutes
2.15 hours
8.51 days
2.21 years
2.10 centuries
20 millennia
1,899 millennia
180,365 millennia
17,184,705 millennia
1,627,797,068 millennia
154,640,721,434 millennia
0.02 seconds
.046 seconds
11.9 seconds
5.15 minutes
2.23 hours
2.42 days
2.07 months
4.48 years
1.16 centuries
3.03 millennia
78.7 millennia
2,046 millennia

Remember, these are just for an average computer, and these assume you aren’t using any word in the dictionary. If Google put their computer to work on it they’d finish about 1,000 times faster.

Now, I could go on for hours and hours more about all sorts of ways to compromise your security and generally make your life miserable – but 95% of those methods begin with compromising your weak password. So, why not just protect yourself from the start and sleep better at night?

Believe me, I understand the need to choose passwords that are memorable. But if you’re going to do that how about using something that no one is ever going to guess AND doesn’t contain any common word or phrase in it.

Here are some password tips:

  1. Randomly substitute numbers for letters that look similar. The letter ‘o’ becomes the number ’0′, or even better an ‘@’ or ‘*’. (i.e. – m0d3ltf0rd… like modelTford)
  2. Randomly throw in capital letters (i.e. – Mod3lTF0rd)
  3. Think of something you were attached to when you were younger, but DON’T CHOOSE A PERSON’S NAME! Every name plus every word in the dictionary will fail under a simple brute force attack.
  4. Maybe a place you loved, or a specific car, an attraction from a vacation, or a favorite restaurant?
  5. You really need to have different username / password combinations for everything. Remember, the technique is to break into anything you access just to figure out your standard password, then compromise everything else. This doesn’t work if you don’t use the same password everywhere.
  6. Since it can be difficult to remember a ton of passwords, I recommend using Roboform for Windows users. It will store all of your passwords in an encrypted format and allow you to use just one master password to access all of them. It will also automatically fill in forms on Web pages, and you can even get versions that allow you to take your password list with you on your PDA, phone or a USB key. If you’d like to download it without having to navigate their web site here is the direct download link.
  7. Mac users can use 1Password. It is essentially the same thing as Roboform, except for Mac, and they even have an iPhone application so you can take them with you too.
  8. Once you’ve thought of a password, try Microsoft’s password strength tester to find out how secure it is.

By request I also created a short RoboForm Tutorial. Hope it helps…

Another thing to keep in mind is that some of the passwords you think matter least actually matter most. For example, some people think that the password to their e-mail box isn’t important because “I don’t get anything sensitive there.” Well, that e-mail box is probably connected to your online banking account. If I can compromise it then I can log into the Bank’s Web site and tell it I’ve forgotten my password to have it e-mailed to me. Now, what were you saying about it not being important?

Often times people also reason that all of their passwords and logins are stored on their computer at home, which is save behind a router or firewall device. Of course, they’ve never bothered to change the default password on that device, so someone could drive up and park near the house, use a laptop to breach the wireless network and then try passwords from this list until they gain control of your network – after which time they will own you!

Now I realize that every day we encounter people who over-exaggerate points in order to move us to action, but trust me this is not one of those times. There are 50 other ways you can be compromised and punished for using weak passwords that I haven’t even mentioned.

I also realize that most people just don’t care about all this until it’s too late and they’ve learned a very hard lesson. But why don’t you do me, and yourself, a favor and take a little action to strengthen your passwords and let me know that all the time I spent on this article wasn’t completely in vain.

Please, be safe. It’s a jungle out there.