Encrypting/Hashing plain text passwords in database [closed]
Asked Answered
A

14

62

I've inherited a web app that I've just discovered stores over 300,000 usernames/passwords in plain text in a SQL Server database. I realize that this is a Very Bad Thing™.

Knowing that I'll have to update the login and password update processes to encrypt/decrypt, and with the smallest impact on the rest of the system, what would you recommend as the best way to remove the plain text passwords from the database?

Any help is appreciated.

Edit: Sorry if I was unclear, I meant to ask what would be your procedure to encrypt/hash the passwords, not specific encryption/hashing methods.

Should I just:

  1. Make a backup of the DB
  2. Update login/update password code
  3. After hours, go through all records in the users table hashing the password and replacing each one
  4. Test to ensure users can still login/update passwords

I guess my concern is more from the sheer number of users so I want to make sure I'm doing this correctly.

Alecto answered 13/11, 2008 at 17:0 Comment(9)
Very Bad Thing(tm) :)Bangup
I don't think that you've worked your question well. You want to know how to get data out of a relational database? That would be a select statement.Comenius
I know how to retrieve records, this is more of a procedural question... Sorry if that was unclear.Alecto
Have you got hold of the Reddit user DB? ;-)Milden
Possible duplicate of Secure Password HashingSubcartilaginous
@Gilles - I'm not sure how this 7 year old question with over 34,000 views is a duplicate of a 6 year old question with under 5,000 views but apparently you think it is. I agree that I wouldn't ask this as a new question on SO today, but this was asked before all of the other Programming-related Stack Exchange sites existed. The question itself is more about the process of migrating from plain-text to more secure passwords, not the specific implementation of encryption/hashing methods.Alecto
@JonathanS. Just because we've let duplicates stick around for a few years doesn't mean we should keep them open when we find them. And I don't see how the existence of other SE sites is relevant.Subcartilaginous
Also, since you're still around, I suggest switching the accepted answer to a good one — not one that claims that passwords are “encrypted”, or that MD5 is a decent choice for a password hash. stackoverflow.com/a/287883 and stackoverflow.com/a/287738 are better answers as they let on that a password hash must be salted and slow.Subcartilaginous
@Gilles - My mention of other SE sites was to reiterate that this was more of a process/procedure question that I would have likely asked on programmers.stackexchange.com, had it existed when I originally asked. I still don't believe this is a duplicate as I wasn't asking which hashing algorithm should be used. The current accepted answer was the best answer at the time that actually discussed a process for migrating from plain-text passwords to a more secure implementation.Alecto
B
19

I would imagine you will have to add a column to the database for the encrypted password then run a batch job over all records which gets the current password, encrypts it (as others have mentiond a hash like md5 is pretty standard edit: but should not be used on its own - see other answers for good discussions), stores it in the new column and checks it all happened smoothly.

Then you will need to update your front-end to hash the user-entered password at login time and verify that vs the stored hash, rather than checking plaintext-vs-plaintext.

It would seem prudent to me to leave both columns in place for a little while to ensure that nothing hinky has gone on, before eventually removing the plaintext passwords all-together.

Don't forget also that anytime the password is acessed the code will have to change, such as password change / reminder requests. You will of course lose the ability to email out forgotten passwords, but this is no bad thing. You will have to use a password reset system instead.

Edit: One final point, you might want to consider avoiding the error I made on my first attempt at a test-bed secure login website:

When processing the user password, consider where the hashing takes place. In my case the hash was calculated by the PHP code running on the webserver, but the password was transmitted to the page from the user's machine in plaintext! This was ok(ish) in the environment I was working in, as it was inside an https system anyway (uni network). But, in the real world I imagine you would want to hash the password before it leaves the user system, using javascript etc. and then transmit the hash to your site.

Bangup answered 13/11, 2008 at 17:7 Comment(7)
Thanks, while I don't like keeping the passwords around, they have already been around for years... The system sends out several emails that include passwords so I'll have to look at those before deciding on using a hash.Alecto
What happens if they have javascript turned off?Hyperphysical
You can't hash the password on the user's machine. The hashing has to be done by a trusted system. (Otherwise anyone that's stolen a copy of the password table can just send you the hash; the hash has become the password.) But, yes, this does require a secure transport like HTTPS from the user.Almira
@Malfist: That is quickly turning into a historical concern. Very, very few people disable js. In that case, though, I would send the unhashed pass to the server and accommodate for that in server-side code. It would simply be a less-ideal fallback.Bicycle
@erickson: for those who are particularly paranoid, you can store the pass double-hashed in the DB and accept a single-hashed pass from the client.Bicycle
Either hash or don't hash. Hashing on the client is pointless. The threat model you're addressing with a hash is exposure of the password database. If that's not a concern, don't hash at all. Otherwise, you can't authenticate using a hash produced by the user (or read from a stolen database copy).Almira
It may be uber-paranoid, I was just mentioning something that I didn't address properly at the time. As I said, I don't have real-world, active experiance of best practice in this area but it is somethingworth considering.Bangup
D
50

EDIT (2016): use Argon2, scrypt, bcrypt, or PBKDF2, in that order of preference. Use as large a slowdown factor as is feasible for your situation. Use a vetted existing implementation. Make sure you use a proper salt (although the libraries you're using should be making sure of this for you).


When you hash the passwords use DO NOT USE PLAIN MD5.

Use PBKDF2, which basically means using a random salt to prevent rainbow table attacks, and iterating (re-hashing) enough times to slow the hashing down - not so much that your application takes too long, but enough that an attacker brute-forcing a large number of different password will notice

From the document:

  • Iterate at least 1000 times, preferably more - time your implementation to see how many iterations are feasible for you.
  • 8 bytes (64 bits) of salt are sufficient, and the random doesn't need to be secure (the salt is unencrypted, we're not worried someone will guess it).
  • A good way to apply the salt when hashing is to use HMAC with your favorite hash algorithm, using the password as the HMAC key and the salt as the text to hash (see this section of the document).

Example implementation in Python, using SHA-256 as the secure hash:

EDIT: as mentioned by Eli Collins this is not a PBKDF2 implementation. You should prefer implementations which stick to the standard, such as PassLib.

from hashlib import sha256
from hmac import HMAC
import random

def random_bytes(num_bytes):
  return "".join(chr(random.randrange(256)) for i in xrange(num_bytes))

def pbkdf_sha256(password, salt, iterations):
  result = password
  for i in xrange(iterations):
    result = HMAC(result, salt, sha256).digest() # use HMAC to apply the salt
  return result

NUM_ITERATIONS = 5000
def hash_password(plain_password):
  salt = random_bytes(8) # 64 bits
  
  hashed_password = pbkdf_sha256(plain_password, salt, NUM_ITERATIONS)

  # return the salt and hashed password, encoded in base64 and split with ","
  return salt.encode("base64").strip() + "," + hashed_password.encode("base64").strip()

def check_password(saved_password_entry, plain_password):
  salt, hashed_password = saved_password_entry.split(",")
  salt = salt.decode("base64")
  hashed_password = hashed_password.decode("base64")

  return hashed_password == pbkdf_sha256(plain_password, salt, NUM_ITERATIONS)

password_entry = hash_password("mysecret")
print password_entry # will print, for example: 8Y1ZO8Y1pi4=,r7Acg5iRiZ/x4QwFLhPMjASESxesoIcdJRSDkqWYfaA=
check_password(password_entry, "mysecret") # returns True
Dramatize answered 13/11, 2008 at 19:2 Comment(6)
I've always understood that hashing a hash is not something you should do, as the possibility of hash collision increase with each iteration. But does this hash(salt+hash) circumvent this? The amount of characters aren't all that many, after all...Nightrider
You're right, re-hashing may reduce the search space (salt doesn't help), but this is irrelevant for password-based cryptography. To reach the 256-bit search space of this hash you'd need a completely random password, 40 characters long, from all available keyboard characters (log2(94^40)).Dramatize
People should be aware that this code does NOT implement the PBKDF2 algorithm; but rather is a non-standard variation of the older PBKDF1 function, modified to use a PRF (HMAC-SHA256 in this case). See rfc2898 for the reference implementation of both kdfs. While this algorithm is probably not insecure, it's not byte-compatible with either PBKDF1 or PBKDF2, nor has it's exact behavior been given the same security review - I'm concerned with the fact that it applies HMAC to a fixed salt, and varys the password instead - this may weaken HMAC.Psychomotor
@Eli: Not completely disagreeing, since PBKDF2 creates keys of arbitrary length and this code doesn't. This has no meaning, of course, in a password-security scheme. But the <a href="tools.ietf.org/html/rfc2898#appendix-B.1.1">text in the rfc you linked to</a> explicitly mentions using the password as HMAC's "key" and the salt as HMAC's "text", which is - by intention - what this example code does.Dramatize
@orip: Cryptography is not something where close enough is usually a good idea; especially if people mistake this for a PBKDF2 implementation, only to find out later that the output doesn't match existing code/data. It's true, if code's the salt/password flaw were fixed, it would be more in line with the appendix, but that's only describing how to use HMAC in PBKDF2; not how PBKDf2 works. Aside from omitting the variable-keylen portion, the most important issue is that the code above completely omits the XOR part of the F() function in PBKDF2 - which is central to it's preimage resistance.Psychomotor
Just to add - for most algorithm discussions, individual implementations can vary, so long as they achieve the desired effect. But PBKDf2 is a carefully designed algorithm, with test vectors specifying the exact behavior; and it's in a problem space where slight changes can mean serious decreases in security. In most other cases, I wouldn't have thought any of this even worth mentioning :)Psychomotor
A
38

The basic strategy is to use a key derivation function to "hash" the password with some salt. The salt and the hash result are stored in the database. When a user inputs a password, the salt and their input are hashed in the same way and compared to the stored value. If they match, the user is authenticated.

The devil is in the details. First, a lot depends on the hash algorithm that is chosen. A key derivation algorithm like PBKDF2, based on a hash-based message authentication code, makes it "computationally infeasible" to find an input (in this case, a password) that will produce a given output (what an attacker has found in the database).

A pre-computed dictionary attack uses pre-computed index, or dictionary, from hash outputs to passwords. Hashing is slow (or it's supposed to be, anyway), so the attacker hashes all of the likely passwords once, and stores the result indexed in such a way that given a hash, he can lookup a corresponding password. This is a classic tradeoff of space for time. Since password lists can be huge, there are ways to tune the tradeoff (like rainbow tables), so that an attacker can give up a little speed to save a lot of space.

Pre-computation attacks are thwarted by using "cryptographic salt". This is some data that is hashed with the password. It doesn't need to be a secret, it just needs to be unpredictable for a given password. For each value of salt, an attacker would need a new dictionary. If you use one byte of salt, an attacker needs 256 copies of their dictionary, each generated with a different salt. First, he'd use the salt to lookup the correct dictionary, then he'd use the hash output to look up a usable password. But what if you add 4 bytes? Now he needs 4 billion copies of the the dictionary. By using a large enough salt, a dictionary attack is precluded. In practice, 8 to 16 bytes of data from a cryptographic quality random number generator makes a good salt.

With pre-computation off the table, an attacker has compute the hash on each attempt. How long it takes to find a password now depends entirely on how long it takes to hash a candidate. This time is increased by iteration of the hash function. The number iterations is generally a parameter of the key derivation function; today, a lot of mobile devices use 10,000 to 20,000 iterations, while a server might use 100,000 or more. (The bcrypt algorithm uses the term "cost factor", which is a logarithmic measure of the time required.)

Almira answered 13/11, 2008 at 18:13 Comment(3)
As far as hash generation supposed to be slow - that depends on the use of the hash. For password storage, thats a desirable quality. For message authentication, it may not be (particularly if the messages being authenticated are network packets).Favela
Good point; slowness is important for protecting against an offline attack. In a network protocol, a man-in-the-middle probably wouldn't have time to find a collision even with a very fast hash, as long as it wasn't broken.Almira
A well-worded explanation of password encryption and terms, thanks.Apfel
B
19

I would imagine you will have to add a column to the database for the encrypted password then run a batch job over all records which gets the current password, encrypts it (as others have mentiond a hash like md5 is pretty standard edit: but should not be used on its own - see other answers for good discussions), stores it in the new column and checks it all happened smoothly.

Then you will need to update your front-end to hash the user-entered password at login time and verify that vs the stored hash, rather than checking plaintext-vs-plaintext.

It would seem prudent to me to leave both columns in place for a little while to ensure that nothing hinky has gone on, before eventually removing the plaintext passwords all-together.

Don't forget also that anytime the password is acessed the code will have to change, such as password change / reminder requests. You will of course lose the ability to email out forgotten passwords, but this is no bad thing. You will have to use a password reset system instead.

Edit: One final point, you might want to consider avoiding the error I made on my first attempt at a test-bed secure login website:

When processing the user password, consider where the hashing takes place. In my case the hash was calculated by the PHP code running on the webserver, but the password was transmitted to the page from the user's machine in plaintext! This was ok(ish) in the environment I was working in, as it was inside an https system anyway (uni network). But, in the real world I imagine you would want to hash the password before it leaves the user system, using javascript etc. and then transmit the hash to your site.

Bangup answered 13/11, 2008 at 17:7 Comment(7)
Thanks, while I don't like keeping the passwords around, they have already been around for years... The system sends out several emails that include passwords so I'll have to look at those before deciding on using a hash.Alecto
What happens if they have javascript turned off?Hyperphysical
You can't hash the password on the user's machine. The hashing has to be done by a trusted system. (Otherwise anyone that's stolen a copy of the password table can just send you the hash; the hash has become the password.) But, yes, this does require a secure transport like HTTPS from the user.Almira
@Malfist: That is quickly turning into a historical concern. Very, very few people disable js. In that case, though, I would send the unhashed pass to the server and accommodate for that in server-side code. It would simply be a less-ideal fallback.Bicycle
@erickson: for those who are particularly paranoid, you can store the pass double-hashed in the DB and accept a single-hashed pass from the client.Bicycle
Either hash or don't hash. Hashing on the client is pointless. The threat model you're addressing with a hash is exposure of the password database. If that's not a concern, don't hash at all. Otherwise, you can't authenticate using a hash produced by the user (or read from a stolen database copy).Almira
It may be uber-paranoid, I was just mentioning something that I didn't address properly at the time. As I said, I don't have real-world, active experiance of best practice in this area but it is somethingworth considering.Bangup
F
4

Follow Xan's advice of keeping the current password column around for a while so if things go bad, you can rollback quick-n-easy.

As far as encrypting your passwords:

  • use a salt
  • use a hash algorithm that's meant for passwords (ie., - it's slow)

See Thomas Ptacek's Enough With The Rainbow Tables: What You Need To Know About Secure Password Schemes for some details.

Favela answered 13/11, 2008 at 17:18 Comment(0)
S
3

I think you should do the following:

  1. Create a new column called HASHED_PASSWORD or something similar.
  2. Modify your code so that it checks for both columns.
  3. Gradually migrate passwords from the non-hashed table to the hashed one. For example, when a user logs in, migrate his or her password automatically to the hashed column and remove the unhashed version. All newly registered users will have hashed passwords.
  4. After hours, you can run a script which migrates n users a time
  5. When you have no more unhashed passwords left, you can remove your old password column (you may not be able to do so, depends on the database you are using). Also, you can remove the code to handle the old passwords.
  6. You're done!
Staple answered 13/11, 2008 at 17:19 Comment(0)
L
2

As the others mentioned, you don't want to decrypt if you can help it. Standard best practice is to encrypt using a one-way hash, and then when the user logs in hash their password to compare it.

Otherwise you'll have to use a strong encryption to encrypt and then decrypt. I'd only recommend this if the political reasons are strong (for example, your users are used to being able to call the help desk to retrieve their password, and you have strong pressure from the top not to change that). In that case, I'd start with encryption and then start building a business case to move to hashing.

Lakisha answered 13/11, 2008 at 17:6 Comment(0)
G
2

For authentication purposes you should avoid storing the passwords using reversible encryption, i.e. you should only store the password hash and check the hash of the user-supplied password against the hash you have stored. However, that approach has a drawback: it's vulnerable to rainbow table attacks, should an attacker get hold of your password store database.

What you should do is store the hashes of a pre-chosen (and secret) salt value + the password. I.e., concatenate the salt and the password, hash the result, and store this hash. When authenticating, do the same - concatenate your salt value and the user-supplied password, hash, then check for equality. This makes rainbow table attacks unfeasible.

Of course, if the user send passwords across the network (for example, if you're working on a web or client-server application), then you should not send the password in clear text across, so instead of storing hash(salt + password) you should store and check against hash(salt + hash(password)), and have your client pre-hash the user-supplied password and send that one across the network. This protects your user's password as well, should the user (as many do) re-use the same password for multiple purposes.

Gretagretal answered 13/11, 2008 at 17:9 Comment(3)
Salt does not need to be secret, and it's burdensome to keep its secrecy.Almira
Also, to be clear - the salt should be different and random for each instance. Not pre-chosen once and used for all hashes.Favela
Mike is of course right in principle, however it's not always possible to change the salt every time (depending on the app specifics), in which case the salt must be kept secret.Stocky
C
1
  • Encrypt using something like MD5, encode it as a hex string
  • You need a salt; in your case, the username can be used as the salt (it has to be unique, the username should be the most unique value available ;-)
  • use the old password field to store the MD5, but tag the MD5 (i.e.g "MD5:687A878....") so that old (plain text) and new (MD5) passwords can co-exist
  • change the login procedure to verify against the MD5 if there is an MD5, and against the plain password otherwise
  • change the "change password" and "new user" functions to create MD5'ed passwords only
  • now you can run the conversion batch job, which might take as long as needed
  • after the conversion has been run, remove the legacy-support
Chloroprene answered 13/11, 2008 at 17:21 Comment(4)
It's also common to use a random salt for each user and store it alongside the hashed password.Recreation
Something that is unknown to the user would make a more secure salt. Perhaps an internal userID or, as Michael suggested, a specially created salt value. If you're using a publicly available unique salt, like username, you should probably also salt with a constant just for good measure.Modigliani
As i understand it, salting has the purpose to prevent dictionary attacks (pre-computing the hashes of a popular passwords and comparing users to them). The salt is always visible, it is not a secret. So why not use the user name, since it is already known, and guaranteed to be unique?Chloroprene
User name is not a bad salt, if you consider a single system. But it would probably be worthwhile for attacker like a repressive government to make dictionaries for the most common user names to increase their chances of breaking into multiple sites. It's better to choose and store a random salt.Almira
A
1

Step 1: Add encrypted field to database

Step 2: Change code so that when password is changed, it updates both fields but logging in still uses old field.

Step 3: Run script to populate all the new fields.

Step 4: Change code so that logging in uses new field and changing passwords stops updating old field.

Step 5: Remove unencrypted passwords from database.

This should allow you to accomplish the changeover without interruption to the end user.

Also: Something I would do is name the new database field something that is completely unrelated to password like "LastSessionID" or something similarly boring. Then instead of removing the password field, just populate with hashes of random data. Then, if your database ever gets compromised, they can spend all the time they want trying to decrypt the "password" field.

This may not actually accomplish anything, but it's fun thinking about someone sitting there trying to figure out worthless information

Africander answered 14/11, 2008 at 17:25 Comment(1)
Good point about Step 2 being needed early in the process. I share your enjoyment of the dummy password field too :)Rummage
E
0

As with all security decisions, there are tradeoffs. If you hash the password, which is probably your easiest move, you can't offer a password retrieval function that returns the original password, nor can your staff look up a person's password in order to access their account.

You can use symmetric encryption, which has its own security drawbacks. (If your server is compromised, the symmetric encryption key may be compromised also).

You can use public-key encryption, and run password retrieval/customer service on a separate machine which stores the private key in isolation from the web application. This is the most secure, but requires a two-machine architecture, and probably a firewall in between.

Evulsion answered 13/11, 2008 at 17:10 Comment(5)
Inability of staff to look up a user's password is a feature, not a drawback. Support of password retrieval isn't a tradeoff, it's a surrender.Almira
I think that's an unfortunately absolutist position to take on this. It really depends on the security value of the information protected by that password.Evulsion
It also depends further on the size and trustworthiness of the staff in question. Password lookup makes much more sense in a small organization than in a large one.Evulsion
You never know what a given password protects; expect users to reuse passwords for their bank on your toy web app. Even if your staff is trustworthy (even if it's just YOU), you can't rule out the possibility of an external attacker getting at your password database.Almira
If staff has a legitimate need to access another user's account, that capability should be built into the system without the staff needing to log in as the user. And the perceived need for a 'password retrieval' system can be replaced by a 'password reset' system.Favela
E
0

MD5 and SHA1 have shown a bit of weakness (two words can result in the same hash) so using SHA256-SHA512 / iterative hashes is recommended to hash the password.

I would write a small program in the language that the application is written in that goes and generates a random salt that is unique for each user and a hash of the password. The reason I tend to use the same language as the verification is that different crypto libraries can do things slightly differently (i.e. padding) so using the same library to generate the hash and verify it eliminates that risk. This application could also then verify the login after the table has been updated if you want as it knows the plain text password still.

  1. Don't use MD5/SHA1
  2. Generate a good random salt (many crypto libraries have a salt generator)
  3. An iterative hash algorithm as orip recommended
  4. Ensure that the passwords are not transmitted in plain text over the wire
Edmundoedmunds answered 14/11, 2008 at 16:45 Comment(0)
S
0

I would like to suggest one improvement to the great python example posted by Orip. I would redefine the random_bytes function to be:

def random_bytes(num_bytes):
    return os.urandom(num_bytes)

Of course, you would have to import the os module. The os.urandom function provides a random sequence of bytes that can be safely used in cryptographic applications. See the reference help of this function for further details.

Stereo answered 22/4, 2009 at 17:37 Comment(0)
D
-1

To hash the password you can use the HashBytes function. Returns a varbinary, so you'd have to create a new column and then delete the old varchar one.

Like

ALTER TABLE users ADD COLUMN hashedPassword varbinary(max);
ALTER TABLE users ADD COLUMN salt char(10);
--Generate random salts and update the column, after that
UPDATE users SET hashedPassword = HashBytes('SHA1',salt + '|' + password);

Then you modify the code to validate the password, using a query like

SELECT count(*) from users WHERE hashedPassword = 
HashBytes('SHA1',salt + '|' + <password>)

where <password> is the value entered by the user.

Demob answered 13/11, 2008 at 17:7 Comment(0)
A
-1

I'm not a security expert, but i htink the current recommendation is to use bcrypt/blowfish or a SHA-2 variant, not MD5 / SHA1.

Probably you need to think in terms of a full security audit, too

Alfrediaalfredo answered 13/11, 2008 at 18:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.