CITS3007 lab 8 (week 10)

Passwords are one of the oldest,¹ most widely used mechanisms for authenticating users – found everywhere from web apps and APIs, to operating systems and IoT devices. Other authentication methods have since become available (such as biometric logins and hardware tokens), but passwords still underpin access control for the vast majority of services.

However, using passwords alone is increasingly considered insecure. Modern best practice favours multi-factor authentication (MFA), where a password is combined with something the user has (like a phone or hardware key) or something they “are” (like a fingerprint). Relying on passwords alone leaves systems too easily vulnerable to “credential stuffing” (attackers re-using stolen passwords) and brute-force attacks.

The way we hash and store passwords has evolved significantly in the past few decades. Simple hashing algorithms like MD5 or SHA-1 are no longer considered acceptable for use in securing systems. Today, secure systems use slow, salted, key-stretching algorithms like bcrypt, scrypt, or Argon2 to make attacks computationally expensive.

Guidelines have also shifted. Prior to the late 2010s, typical advice for generating passwords was that they should:

Much of the burden was placed on users to come up with strong passwords, remember dozens of them, rotate them regularly, and never write them down.

By 2017, many standards bodies, such as NIST, had abandoned these practices. They resulted in users:

In short, as the comic XKCD explains, these practices trained people to use passwords that are hard for humans to remember, but easy for computers to guess.

2 Modern user password best practice

The UK National Cyber Security Centre (NCSC) provides a helpful page of advice for system owners on modern best practices, which is worth reading through.

Service accounts

Note that the above advice only applies to user accounts used by humans. Credentials are also needed by so-called service accounts. These are accounts used by automated tools (such as scripts, bots, or background services) in order to authenticate themselves, typically to other systems.

For instance, if your computer uses a backup program which backs your files up to cloud storage, then it will need some sort of credential – an account name and password, or equivalent³ – for the cloud provider (such as Backblaze or Carbonite) who provides that backup storage.

For service accounts, different considerations apply, since often:

Service accounts often use only one factor to authenticate. (Your backup program can’t ask you to supply your thumbprint every night at 2 a.m.)
Unlike humans, computer programs don’t need their credentials to be “memorable” – they can just as easily use a completely randomly generated string of bytes or characters.

OWASP calls accounts like these “Non-Human Identities”, and notes that best practice is still to ensure the credentials they use regularly expire (or are rotated).

3 Cryptography exercises

3.1 Digital signatures and hash collisions

Visit the MD5 Collision Demo page, at https://www.mscs.dal.ca/~selinger/md5collision/. In lectures, we’ve noted that the MD5 hash algorithm has been considered cryptographically broken since 2004. It is vulnerable to collision attacks, where two different inputs produce the same hash value – thus, it should never be used for password hashing, digital signatures, or sensitive data integrity checks.

Follow the provided links on that page to two PostScript documents – the first a letter of recommendation to an intern, and the second, an order to grant the intern a security clearance. Download them (e.g. with wget), and confirm that they have the same MD5 hash. (The command md5 <somefile> will display the MD5 hash.)

What exactly is a digital signature? In essence, it’s an assertion of the form:

But how do we make such an assertion? We need a way of doing so which also allows the assertion to be easily checked. The trick is: rather than working with the whole document contents, we pass it through a hash function. This produces a short, fixed-size value that acts like a “fingerprint” of the document – changing even a single byte of the document will result in a radically different hash value.

When someone “signs” a document, what they actually sign is this hash value. So the claim they are making is really: “I, person A, signed a document whose hash is H.” For a good cryptographic hash function, it is easy to compute the hash of a document, but infeasible to find a different document with the same hash. That means the hash uniquely (for practical purposes) identifies the document.

When MD5 was “broken” in 2005, researchers showed it was possible to generate a collision (like the PostScript documents linked to) on a typical home PC in several hours. Today, generating an MD5 collision typically takes only minutes or second (and if an attacker has access to GPUs to do the computation, it is effectively instantaneous).⁴

Generating hash collisions

Technically, doing something like finding two PDF documents with the same hash is harder than just finding a collision. Finding an MD5 collision means only that you have found two binary strings – which will just look like random binary junk – that both produce the same hash. To produce controlled, meaningful messages – documents or programs, say – is a harder task, and is called a “chosen-prefix collision” attack, but doing that too for MD5 is quite feasible on a home computer or laptop.

If you are interested in generating your own MD5 collisions or chosen-prefix collisions, software to do so is available at https://github.com/cr-marcstevens/hashclash, but compiling and running it is not an essential part of this lab.

A lab worksheet that works through the process of generating collisions is available at the Seed Security Labs site, here.

3.2 Checking for password breaches

Do you feel safe entering your password into the site? Perhaps not. Fortunately, you don’t have to. Save the following in your development environment as passcheck.sh, and make it executable with chmod a+rx passcheck.sh:

  #!/usr/bin/env bash

  baseurl="https://api.pwnedpasswords.com/range"
  read -s pass
  hash=$(echo -n "$pass"|sha1sum)
  hashhead=${hash:0:5}
  hashtail=${hash:5:35}

  # "^^" converts to uppercase -- needed because sha1sum use _lower_-case
  # hex digits for its hash
  curl -s "${baseurl}/${hashhead}" | grep "${hashtail^^}"

The Pwned Passwords site relies on a system call “k-anonymity”⁵ – you never actually send your plaintext password to the website, or even the full hash. Rather, you just send a small prefix of your hash (in this case, the first 5 characters) to the service, and it returns a list of all the suffixes (in this case, the last 35 characters) of breached password hashes that start with that prefix. So a full password hash is never sent between you and the server, or seen by the server.

The prefix you send is far too short to uniquely identify your password’s hash – many different hashes will share the same prefix, so the server never sees enough information to reconstruct or recognise your specific password.

And you can then check, locally, “Does my password match any of them exactly?” Your query is effectively “hidden” amongst many possible candidates, so your actual password (or its full hash) is never revealed.

as output, indicating that an instance of the password “password” has been found over 52 million times in known data breaches. The pass-phrase “correct horse battery staple” had never been used before Randall Munroe used it in his comic Xkcd – how many times has it now been found in known data breaches?

4 Cryptography questions

See if you can answer the following questions, after reviewing the material on cryptography in the lectures.

Roman soldiers used “watchwords” to identify each other (especially at night, to distinguish allies from potential enemies). These watchwords would often be changed daily for security. See Polybius’s Histories, translated by E.S. Shuckburgh (London: Macmillan, 1889), p 487, available at Project Gutenberg.↩︎
See section 5.1.1, “Memorized secrets” of NIST standard SP 800-63B ↩︎
Sometimes service accounts will authenticate themselves using a password-like value. But often a preferred approach is for them to use a public-private key pair. When the service account needs to authenticate itself to another system, it sends an authentication request; the foreign system provides a randomly generated value (a “challenge”) which the service account then must encrypt with its private key (producing a “response”), proving that it is who it claims to be. This is basically the same method the Git program uses to authenticate itself on your behalf to GitHub if you use an SSH key pair to authenticate – it’s called a challenge–response protocol.↩︎
You might ask – what exactly does it mean to “generate a collision”? It means that you can find two files which produces the same hash – but there are no other constraints on what the files need to contain. If you need the files to, say, both be valid PNG files, or both be human-readable text, the task becomes harder.
Other attacks besides “collision attacks” are “preimage attacks” and “second preimage attacks”. In a “preimage attack”, you know the hash of some file X, but don’t have access to X itself, and your task is to find a second file which produces the same hash. In a “second preimage attack”, you have some file X – and therefore can easily calculate its hash – and want to again find a second file which produces the same hash. MD5 is still secure for these latter two kind of attacks; it is only collision attacks it is vulnerable to. Nevertheless, that is enough to mean that it shouldn’t be used for any kind of cryptographic purpose.↩︎
See “Validating Leaked Passwords with k-Anonymity” and https://haveibeenpwned.com/API/v3#PwnedPasswords for more information on how this works.↩︎

CITS3007 lab 8 (week 10) – Passwords

1 Introduction

2 Modern user password best practice

3 Cryptography exercises

3.1 Digital signatures and hash collisions

3.2 Checking for password breaches

4 Cryptography questions