Thursday, June 07, 2012

What Happened to LinkedIn

Yesterday we learned that LinkedIn was hacked and someone got over 6 million password hashes. Here's the Reuters report, LinkedIn suffers data breach. We don't know details of how they got in, but we know about the passwords. In that regard LinkedIn almost did a reasonable job in protecting passwords. Here's an explanation (I hope) for the layman.

Web sites don't, or shouldn't, store your actual password. If someone got (or had) access that would be too easy (or tempting) to steal. Instead they hash them and store the hash. A hash function takes some string (of letters or numbers or whatever) as input and mashes them up into a number of some fixed length. It's important that the function uses just the string and not other stuff like a count or the time, that way, whenever you hash that same string you get the same number as a result. So a hash function might turn "Alice" in 1234 and "Bob" into 5678. Since you can enter any string as input and it always gives a fixed length number, there can be duplicates, but that's ok as along as you always get the same result.

The way computers use them for passwords is easy. When you create your password, it hashes it and stores the hash. Then when you login, you enter your password and the computer hashes it and compares it to the hash it has stored. If they match you're in, if not, you try again. So this should be good right? Only you can login, the computer doesn't know your password (just the hash) and even if someone steals the hash they don't have your password. Almost.

While it's hard to turn 1234 into "Alice", computers are good at doing lots of things over and over. What if someone knew the hash function used and got the file of all the hashed passwords? They could just try every string and check the results. Maybe "Charlie" hashes to 3456 and "Delta" to 1023. But it would be easy to write a program to try every string or say every string up to 8 letters long and get the hash result. Now they could take a hash from the password file and look it up in their results to see what string hashes to that value. This is called a dictionary attack because the attacker is building a dictionary of all the hash values.

A cryptographic hash function is just a very strong one. It uses very big numbers so it should rarely result in duplicates and is very hard to undo. It should be impossible (with current knowledge) to turn 1234 back into "Alice" and even if you make a small change to the input "alice" it should result in a very different number. The current standard cryptographic hash function is SHA-1. It was designed by the NSA and while there are some theoretical weaknesses now discovered it still hasn't been broken. There is a new standard in development, but current best practice is to use SHA-1. It turns strings into numbers 160 bits long, that's about 49 decimal digits.

But even cryptographic hash functions are subject to dictionary attacks. If every site just hashed their password with SHA-1, then it would be useful for attackers to build one big dictionary of the hashes of all strings of password length. They could reuse it for every site on the internet. Except for one thing, salt. The idea is that easy site, before they hash the password, uses one number, known as salt, to change it to some other string, and then they hash that. So Amazon might use 3221 (well much bigger) for their salt and Yahoo might use 9873, so even if Alice uses the same password at both sites they will have very different hashes. Now it's not so useful to build a dictionary of SHA-1 hashes, because they need to build one for each salt value.

And that's what LinkedIn forgot, the salt. They just used SHA-1 on your password and stored that. So now, hackers are hashing all the random strings they can and comparing the results to hashes they found at LinkedIn, if they find a match, they know your LinkedIn password.

So now you should understand why there are all these annoying guidelines about picking passwords. How many strings do hackers have to try to find your password hash? Let's say your password is 4 lowercase letters, that's 264 or 456,976 possible strings. A 10 letter password is over 141 trillion possibilities. That's big, but still small for a computer to try. But computers treat upper and lowercase letters differently. So if some of the letters in your password are uppercase, then the attackers have to try 5210 or over 144 quadrillion strings. Add digits or punctuation marks and attackers have to try over 3 quintillion 10 character strings. Make it 13 characters long and they have to try over a septillion possibilities. This is also the reason you don't want to pick known words. Instead of just trying every combination of characters alphabetically, they'll hash words they find in the dictionary first in the hopes of cutting down their attempts.

This is also why you shouldn't reuse the same password at different sites. Once they know your LinkedIn password, then they'll try it on your Facebook account and email and bank account, etc. Your password is only as strong as the weakest (or dumbest) site where it's stored.

2 comments:

The Dad said...

Your last point, IMHO, is the most important one. All those notices telling people to change their LinkedIn password seems less important to me than changing your other website passwords, if in fact you use the same one in multiple places. If they guess my password is "burger" and they match my email address, they could try logging in to Amazon with the same email and pwd, no?

Howard said...

Yes. See my next post. :)