The hashes generally used for things like hash tables don't need to be cryptographically secure, and as such can be substantially simpler. When you start getting into cryptographically secure hashes things get complex enough that you start ending up with timing variations if you're not very careful. Especially once you start talking about table lookups / etc (although that's more common with encryption algorithms than straight hash algorithms).
And the compiler - with C and C++ at least - is free to optimize in ways that introduce timing variations. Which is what sparked this conversation in the first place. So even if your source code doesn't take data-dependent time, the compiler may make the compiled output take data-dependent time. And there's no way around that in pure C / C++.
Ok, fine, so there are no real-time, optimization, or scheduling guarantees in C / C++ and it's better to be safe than sorry. But why does it actually matter if either the hash function or hash value comparison takes data-dependent time? How can you use that information to recover the password?
One of these should take slightly more time than the others. Let's say it's 'carmen'. I now know that the first character of the password hash is 'f'. I then try passwords where the first character of the password hash is 'f' and the second is different. Repeat until I have the entire password.
Note that this can also turn an online brute-force attack into an offline brute-force attack with a small online component. (The only difference being instead of a password dictionary, you brute-force for hashes with the known bits.)
Note that a salt - if the salt is never leaked - prevents this attack.
(Except that if you have an account yourself you can do a timing attack against your own account to try to figure out the salting scheme!)
--------
If the password hash itself takes data-dependent time... There are plenty of ways to go from that to breaking passwords. I could elaborate if you wish.
Hey this is pretty interesting. Thanks for taking the time to explain. My first reaction was that you have to know the hash function for this attack to work, but I guess the salt serves the same purpose as hiding the hash function, and is more practical since then you can use a generic hash. Presumably someone figured out a way to prevent timing attacks against your own account - what is it?
Sure, assume that the hash takes data-dependent time, but that's the only vulnerability. I can see that this might reveal password length, but not easily beyond that. How does it work? Does SHA256 with salt take constant time?
In general, what is the most secure password scheme if you're looking to prevent timing attacks? Do you need real-time guarantees? You have enough material for a nice "evolution of secure password authentication" article here, if you ever wanted to write it up.
> Presumably someone figured out a way to prevent timing attacks against your own account - what is it?
As long as the salt has sufficient entropy, this isn't a problem. So instead of generating the salt from the username / etc, you use a random salt and store it.
> How does it work?
Effectively, by leaking internal state of the cypher / hash through timing variations. The most common ways of doing so are either through data-dependent table lookups or through operations that may be microcoded or otherwise take variable time (multiplications, rotations sometimes, divisions, modulo, squaring).
See, for instance, http://cr.yp.to/streamciphers/leaks.html - although note that this is dealing with stream cyphers not hash functions. In particular, the linked paper describes an effective timing attack against AES by taking advantage of processor cache effects on a data-dependent table lookup. I'd give it a read - it's quite interesting.
SHA256 is pretty good w.r.t. being close to constant time for a fixed input length. Not perfect, but meh. It uses bitwise xor and and, rotates, shifts, and addition, none of which are generally variable-time (although note that on some platforms rotates are variable-time).
For most things, SHA2, by which I mean the newer variants, works reasonably well. In particular, you store SHA2(password xor random_salt xor pepper) and discard the original password. Store the salt in the database with the user, and if you want to get really fancy you embed a single random value ("pepper") in the application code - this is a kind of last-ditch effort to hopefully prevent user passwords from being leaked even if the database is leaked.
If you really want to get fancy, you also embed a couple "canary" accounts (random made-up accounts) in your database, with a passive device on the network that screams bloody murder if any of those accounts ever get successfully logged in to.
However, that being said, that doesn't really answer your question.
A couple things:
1) You don't want to use SHA, generally. You want to use a proper KDF (key derivation function). Why? Standard hash functions are designed to be fast and not take much memory. Which means it is a whole lot easier to crack.
2) If any of the user authentication is in a language that you cannot disable optimizations in, and you cannot look at the assembly code coming out, you cannot prevent timing attacks. You may be able to mitigate them, but that is all.
3) In general, if a KDF / hash function doesn't use any data-dependent table lookups / pointer chasing / control flow, and doesn't do any complex functions (multiplication, etc), it's probably relatively safe from timing attacks. Otherwise... Not so much.
4) Theoretically, you need cooperation of the kernel, etc, to truly be safe from timing attacks. Among other things, context switches could leak information. In actuality... It's definitely possible but I don't know of anyone who has made it work.
This stuff is so interesting. D.J. Bernstein is certainly prolific! It's cool how processor-specific that cache-based attack is. It's weird, I have a compilers background but (evidently) I've never even thought about timing attacks. There isn't a lot of overlap between the PL and crypto communities I guess. I know we used to use asm volatile with gcc but apparently that isn't even a guaranteed scheduling barrier anymore.
Anyway, my curiosity is satisfied for now, but thanks again for sharing, and keep posting about this stuff.
The hashes generally used for things like hash tables don't need to be cryptographically secure, and as such can be substantially simpler. When you start getting into cryptographically secure hashes things get complex enough that you start ending up with timing variations if you're not very careful. Especially once you start talking about table lookups / etc (although that's more common with encryption algorithms than straight hash algorithms).
And the compiler - with C and C++ at least - is free to optimize in ways that introduce timing variations. Which is what sparked this conversation in the first place. So even if your source code doesn't take data-dependent time, the compiler may make the compiled output take data-dependent time. And there's no way around that in pure C / C++.
reply