CITS3007 lab 9 (week 11) – Passwords

Passwords are one of the oldest,¹ most widely used mechanisms for authenticating users – found everywhere from web apps and APIs, to operating systems and IoT devices. Other authentication methods have since become available (such as biometric logins and hardware tokens), but passwords still underpin access control for the vast majority of services.

However, using passwords alone is increasingly considered insecure. Modern best practice favours multi-factor authentication (MFA), where a password is combined with something the user has (like a phone or hardware key) or something they “are” (like a fingerprint). Relying on passwords alone leaves systems too easily vulnerable to “credential stuffing” (attackers re-using stolen passwords) and brute-force attacks.

The way we hash and store passwords has evolved significantly in the past few decades. Simple hashing algorithms like MD5 or SHA-1 are no longer considered acceptable for use in securing systems. Today, secure systems use slow, salted, key-stretching algorithms like bcrypt, scrypt, or Argon2 to make attacks computationally expensive.

Guidelines have also shifted. Prior to the late 2010s, typical advice for generating passwords was that they should:

Much of the burden was placed on users to come up with strong passwords, remember dozens of them, rotate them regularly, and never write them down.

The principle of psychological acceptability.

Psychological acceptability is one of Saltzer and Schroeder’s classic principles of secure system design, and suggests that security mechanisms should not make the system harder to use than necessary, and should fit naturally into how users think and behave.

But old-style password rules make passwords:

hard to remember: humans are bad at memorising lots of complex, arbitrary strings.
hard to type correctly: instead of being English or other natural language words, they contain digits and punctuation in unusual places.
tempting to reuse, or write down: if people are forced to change passwords frequently, or aren’t encouraged to use a password manager, they’ll likely reuse passwords with slight variations, or write the passwords down (e.g. on a sticky note affixed to the computer screen).

By 2017, many standards bodies, such as NIST, had abandoned these practices. They resulted in users:

In short, as the comic XKCD explains, these practices trained people to use passwords that are hard for humans to remember, but easy for computers to guess.

2. Modern user password best practice

The UK National Cyber Security Centre (NCSC) provides a helpful page of advice for system owners on modern best practices, which is worth reading through.

3. Cryptography questions and exercises

See if you can answer the following questions, after reviewing the material on cryptography in the lectures.

Sample solution

If we run the commands, we get output like the following:

$ printf mypassword | md5sum
34819d7beeabb9260a5c854bc85b3e44  -

The first “word” of output is the actual hash; the “-” represents the name of the file being hashed (in this case, “-” represents standard input).

The hash is a sequence of hexadecimal digits, and represents 16 bytes (or 128 bits).

Each pair of characters in the original hash represents one byte, so if stored in C as an array of bytes, we could write it as follows:

  // fragment 1
  char somehash[] = {0x34, 0x81, 0x9d, 0x7b, 0xee, 0xab, 0xb9, 0x26,
                     0x0a, 0x5c, 0x85, 0x4b, 0xc8, 0x5b, 0x3e, 0x44};

Strings in C also allow us to use hexadecimal escape sequences, so we could also write the following:

  // fragment 2
  char somehash[] = "\x34\x81\x9d\x7b\xee\xab\xb9\x26"
                    "\x0a\x5c\x85\x4b\xc8\x5b\x3e\x44";

The difference is that in fragment 1, somehash is a “plain” array or buffer, of size 16 elements, but in fragment 2, somehash is a null-terminated C string, so the array will be of size 17.

Note that best practice suggests that in the above examples, we should specify an exact size for the array, rather than relying on it being implicitly defined, and should use an enum or #define to specify the size, rather than a magic number – so the first example becomes

  // fragment 1
  enum {
    HASH_SIZE = 16
  };

  char somehash[HASH_SIZE] = {
      0x34, 0x81, 0x9d, 0x7b, 0xee, 0xab, 0xb9, 0x26,
      0x0a, 0x5c, 0x85, 0x4b, 0xc8, 0x5b, 0x3e, 0x44
  };

Sample solution

Salting passwords prevents several common attacks on passwords.

If a password is used unsalted, directly as is, then every user in the world who happens to use the password “qwerty”, for instance, will have exactly the same hash for their password (assuming the same algorithm is used). If we used the MD5 hashing algorithm,⁴ then every user who uses the password “qwerty” will get the hash d8578edf8458ce06fbc5bb76a58c5ca4:

$ printf qwerty | md5sum
d8578edf8458ce06fbc5bb76a58c5ca4  -

That means if an attacker happens to get hold of the list of hashed passwords, it’s extremely easy for them to find out what the user’s password is – despite the fact that hashes are “difficult to reverse”.

The attacker knows that many people choose very common passwords and that “qwerty” is one of these, and that the MD5 hash of “qwerty” is d8578edf8458ce06fbc5bb76a58c5ca4. So if the attacker has a list of the hashes of common passwords, they’ll easily recognize them whenever they appear.

Adding a random salt to the password destroys this straightforward correspondence between password and hash.

In a bit more detail – how could attackers try to make use of a list of leaked, but hashed passwords? Let’s assume an attacker has access to a list of hashes for 100,000 of our customers, and wants to obtain the original passwords. There are a few options, depending on whether the hashes are salted or unsalted, and what kind of hash algorithm we used.

Full brute-force with real-time hashing (fast hash algorithm and unsalted hashes only)

Modern GPUs can compute billions of hashes per second using fast hash algorithms like MD5 (e.g., ~67 billion/sec on a mid-range 5-year-old NVIDIA GPU).⁵ This sounds fast, and can be used to brute-force short passwords that use these algorithms with no salt.

But once we consider all alphanumeric passwords up to length 8, that’s roughly 200 trillion passwords,⁶ which would take over 8 hours for one full scan. For large-scale attacks (e.g., cracking 100,000 user hashes), this becomes practically impossible due to enormous total time and cost. So attackers don’t normally rely on pure brute force.

Salts are combined with the password (e.g. concatenated or prepended) before hashing, so they effectively expand the password space (e.g. just a 1-byte salt prepended to our \(\leq\) length 8 passwords expands the password space from 200 trillion to around 5 quadrillion). This makes pure brute force infeasible in most real-world cases.

(And as we know, modern systems shouldn’t be using fast hash algorithms like MD5, but dedicated slow password hashing algorithms like bcrypt, scrypt and Argon2.)

tl;dr: Only feasible for very short, unsalted passwords using fast hash functions like MD5.

Precomputed hashes (simple in-memory lookup table – only for fast hash algorithms, with small hash size, and unsalted hashes)

Leaked lists exist of, say, 10 million popular plaintext passwords (e.g. the RockYou list⁷). Assume each password–hash pair includes the plaintext (say, 10 bytes on average – true for the RockYou list) and an MD5 hash (16 bytes), total storage is then around 300 MB – easily small enough to fit into RAM. This allows for extremely fast hash-to-password lookups via a simple table.

The hashes need to be precomputed, but that doesn’t take long – at 67 billions of hashes per second, it takes less than a millisecond.

The approach breaks down completely, however, if the stored hashes are salted. Each user’s hash depends on both the password and a unique salt. For 10,000 users with unique salts, that’s effectively 10,000 versions of the 300 MB table – about 3 terabytes total. (Even more with larger dictionaries or hash algorithms that produce longer bytestrings. SHA-1 produces 20-byte output. Modern algorithms like scrypt, bcrypt and Argon2 are configurable, but typically are used to produce 32 byte hashes, so 6 terabytes total; and the time to precompute the hashes would be much longer, too.)

tl;dr: Very efficient for unsalted hashes, generated from fast hash algorithms which produce smallish-sized output (e.g. MD5), but completely undermined by salting.

Rainbow tables (in-memory tables with space–time trade-offs)

Rainbow tables are a clever optimisation of precomputed hash attacks. Rather than storing every possible password–hash pair, they perform additional calculations at runtime, but can reduce storage requirements drastically. (Wikipedia gives a fuller explanation, for those who are interested.)

For hash algorithms like MD5 or SHA-1, rainbow tables can compress the entire space of all short (e.g. \(\leq 8\) character) passwords into a few gigabytes, making it feasible to load the table into RAM. Once in memory, lookups are very fast, even with the extra calculations needed.

But there are two drawbacks. Firstly, rainbow tables need to be pre-prepared (we need to precompute all the hashes) – this can take hours or days. Secondly, the entire technique fails if even a single salt byte is added. Since each salt value changes the output hash, a separate rainbow table would be needed for each possible salt. This ends up multiplying storage needs by billions.

Widespread use of salts has rendered rainbow tables obsolete, except when attacking systems that are old or have very poor security. These do still occur, though – in June 2020, the online antiques marketplace LiveAuctioneers suffered a data breach, and was discovered to be storing passwords as unsalted MD5 hashes.⁸

tl;dr Rainbow tables shrink the storage needed for unsalted hash attacks, but are useless against salted hashes.

Salted hashes with slow, modern algorithms

Best-practice password storage uses a unique, random salt per user, and a key-stretching algorithm designed to be expensive to compute, such as bcrypt, scrypt, or Argon2. These algorithms deliberately slow down hashing to make large-scale attacks computationally impractical, even with modern GPUs or custom hardware.⁹

In practice, how many guesses could an attacker attempt? The above algorithms allow system designers to alter the RAM and time needed to compute a single hash.
- With bcrypt at cost factor 12 (default in many libraries), a single GPU can manage only 100–200 guesses per second per target.
- Argon2 or scrypt with high memory usage may reduce this to 10 guesses per second or fewer.
So even if a user has used one of the 10 million most common passwords, attacking the hash of just that one user could take a million seconds, which is about 12 days. We can’t re-use the results of our calculations for other customers, because they all will have a different salt. And spending 12 days each for all the 10,000 customers is completely uneconomic.

This is why salts and slow hashes are important – they reduce targeted, per-user brute-force attacks to a very slow rate, even when the user has used a short, not very secure password – and prevent attackers from re-using the results of previous calculations.

And if a long, unique password is used? Then an attack is not feasible at all – the space of all possible passwords is just too large.

tl;dr: Renders brute force attacks infeasible even for short, bad passwords. For long, good passwords – no chance.

Sample solution

Every hash function outputs results which are of some exact, fixed size; the exact size will depend on the function. (For instance, we’ve seen above that the MD5 algorithm always outputs hashes of 16 bytes.)

The input to a hash function, however, is a sequence (usually of bytes) of arbitrary length. The input domain is therefore infinite, but the output range of the function is finite: hence, for any one hash value, there must always be an infinite number of plaintexts which produce that hash value.

We can see this more straightforwardly if we imagine a hash function that produces outputs of only one byte in length. Such a function would not be very useful for cryptography purposes (can you explain why?), but we could use it for example to distribute items across a hash table of size less than 256.

The output of the function is one byte (256 different values), but it will operate on any arbitrary sequence of input bytes. It therefore follows that for each of the 256 output results, there must be an infinite number of inputs which produce it.

3. CITS3007 project

You can use your lab time to work on the CITS3007 project. You may wish to discuss your project tests and code design with other students or the lab facilitators (although the actual code you submit must be your own, individual work).

Roman soldiers used “watchwords” to identify each other (especially at night, to distinguish allies from potential enemies). These watchwords would often be changed daily for security. See Polybius’s Histories, translated by E.S. Shuckburgh (London: Macmillan, 1889), p 487, available at Project Gutenberg.↩︎
See section 5.1.1, “Memorized secrets” of NIST standard SP 800-63B ↩︎
Sometimes service accounts will authenticate themselves using a password-like value. But often a preferred approach is for them to use a public-private key pair. When the service account needs to authenticate itself to another system, it sends an authentication request; the foreign system provides a randomly generated value (a “challenge”) which the service account then must encrypt with its private key (producing a “response”), proving that it is who it claims to be. This is basically the same method the Git program uses to authenticate itself on your behalf to GitHub if you use an SSH key pair to authenticate – it’s called a challenge–response protocol.↩︎
Note that as per the lectures, MD5 should not be used in practice as a password hashing function; a dedicated function like SCrypt should be used instead.↩︎
See here for benchmarks of the Hashcat tool using unsalted MD5 hashes on an NVIDIA GeForce RTX 3090. Graphics processors are often used for password cracking, because they contain many thousands of cores, so can be used to calculate a large number of hashes in parallel.↩︎
The number of possible characters is \(26 + 26 + 10\) (uppercase, lowercase and digits). So the total number of passwords is \(62^1 + 62^2 + ... + 62^8 \approx 2.2 \times 10^{14}\), or around 200 trillion.↩︎
In 2009, software development company RockYou was breached; they stored a list of 14 million passwords in plain text. (See https://www.kaggle.com/datasets/wjburns/common-password-list-rockyoutxt). It is widely used in security research and password cracking.↩︎
See Troy Hunt, https://x.com/troyhunt/status/1297036195315085313, linking to The Daily Swig cybersecurity news article, “LiveAuctioneers data breach: Millions of cracked passwords for sale, say researchers”↩︎
Field-programmable gate arrays (FPGAs) are more expensive than GPUs, but faster, and tyically have lower power consumption. Custom integrated circuits with a specific hash algorithm “baked in” to them (application-specific integrated circuit, or ASICs) are fastest of all and have even lower power consumption, but are inflexible and expensive to produce – they’re used for tasks like Bitcoin mining due to their speed and low operational costs.↩︎

CITS3007 lab 9 (week 11) – Passwords – solutions

1. Introduction

2. Modern user password best practice

3. Cryptography questions and exercises

3. CITS3007 project