Securely storing and searching by social security number
Asked Answered
R

4

13

So I'm working on a supplemental web-based system required by an HR department to store and search records of former personnel. I fought the requirement, but in the end it was handed down that the system has to both enable searching by full SSN, and retrieval of full SSN. My protestations aside, taking some steps to protect this data will actually be a huge improvement over what they are doing with it right now (you don't want to know).

I have been doing a lot of research, and I think I have come up with a reasonable plan -- but like all things crypto/security related there's an awful lot of complexity, and it's very easy to make a mistake. My rough plan is as follows:

  1. On first time run of the application, generate a large random salt, and a 128bit AES key using RijndaelManaged
  2. Write out both of these into a plaintext file for emergency recovery. This file will be stored offline in a secure physcial location. The application will check for the presence of the file, and scream warnings if it is still sitting there.
  3. Store the salt and key securely somewhere. This is the part I don't have a great answer for. I was planning on using DPAPI -- but I don't know how secure it really is at the end of the day. Would I be better off just leaving it in plaintext and restricting filesystem access to the directory its stored in ?
  4. When writing a record to the database, hash the SSN along with the large salt value above to generate a field that is searchable (but not recoverable without obtaining the salt and brute forcing all possible SSNs), and AES encrypt the raw SSN value with a new IV (stored alongside) to generate a field that is retrievable (with the key/iv) but not searchable (because encrypting the same SSN twice should yield different output).
  5. When searching, just hash the search value with the same salt and look it up in the DB
  6. When retrieving, decrypt the value from the DB using the AES key/iv

Other than needing a way to store the keys in a relatively secure way (number 3 above) it seems solid enough.

Things that won't work for us:

  • "Don't do any of this" Is not an option. This needs to be done, and if we don't do it they'll a) get mad at us and b) just pass all the numbers around in a plaintext document over email.

This will be internal to our network only, so we have that layer of protection at least on top of whatever is implemented here. And access to the application itself will be controlled by active directory.

Thank you for reading, and for any advice.

Update #1: I realized from the comments that it makes no sense to keep a private IV for the SSN retrieval field. I updated the plan to properly generate a new IV for each record and store it alongside the encrypted value.

Update #2: I'm removing the hardware stuff from my list of stuff we can't do. I did a bit of research, and it seems like that stuff is more accessible than I thought. Does making use of one of those USB security token things add meaningful security for key storage?

Reamer answered 18/7, 2013 at 20:6 Comment(10)
Wouldn't IT Security have been a better place to ask this?Cythiacyto
we certainly don't want them to get mad at youEndora
First: security.stackexchange.com may help more. Second: I believe generating a new IV with each encryption and storing it with the encrypted ssn so that you can't tell if an SSN is the same in two places would be better. Note, though, I'm not a security professional. Look into that option thoroughly before implementing it.Palgrave
Will you need display/retreive SSN from DB or use it for search purposes only?Disagreement
I like it, but one note would be that the salt you use for hashing doesn't need to be secret. Having it be unique for each SSN (just like the IV) would be better. You could even use the IV as the salt and kill two birds with one stone.Retinue
@EricPetroelje Wouldn't having a different salt for each hash make searching impossible? Also, wouldn't it be pretty trivial to reverse a hash of an SSN if you had the salt, since there's so little entropy (only 1b possible SSNs)?Reamer
A salt (and an IV) is, by definition, a public value. It should be different for each entry. It should change every time the hash (or encryption) function is called. See point #2 here: blog.cryptographyengineering.com/2011/11/… As you've already found out, cryptography is really hard and easy to get wrong. If this is important, get expert consultation.Inviolate
If the system has to support retrieval of SSN then the cat is out of the bag. I don't see how storing it encrypted adds protection. If this is an HR application then hopefully it is a secured application with database and table security. Really, what is the chance they hack the database but not hack the salt.Intermezzo
@Blam It is a secured application in the general case, and will be kept internal to our netwotrk. As far as the cat being out of the bag, maybe you're right -- that's one of the things I'm trying to find out. Maybe there is no benefit to encrypting/hashing them at all, and I should just save myself the headache and just rely on Network/Server/Database level security not to fail.Reamer
Even if you encrypt at the table level you need network, server, and database level security not to fail. If the app must produce that data then that is where someone is going to steal it. If an unarmed guard walks the money to the money to the curb a paper bag then that is where I am going to steal it.Intermezzo
E
2

I've had to solve a similar problem recently and have decided to use an HMAC for the hashing. This would provide more security than a simple hash, especially as you can't salt the value (otherwise it wouldn't be searchable).

Then as you say, use AES with a random salt for the reversible encryption.

It maybe that you don't need to encrypt this data but I had no choice and this seemed like a reasonable solution.

My question on IT Security https://security.stackexchange.com/questions/39017/least-insecure-way-to-encrypt-a-field-in-the-database-so-that-it-can-still-be-in

Effect answered 18/7, 2013 at 22:47 Comment(0)
S
2

With respect to key storage there are two methods you can use if you choose to store your AES key in the web.config. First method is to use DPAPI as you mentioned. This will encrypt your web.config application setting for that box. The other method you can use is via RSA key (check out this MSDN tutorial), this will encrypt your web.config just like DPAPI however you can use the RSA key on multiple boxes, so if the application is clustered then RSA key is better (just more complicated to setup).

As far as generating the key before you run your application not on the machine running the app this way there's no chance you're going to leave the text file in the directory. You should generate the key as follows.

  1. Generate a random value using RngCryptoServiceProvider
  2. Generate a random salt value using RngCryptoServiceProvider
  3. Hash the two values with PBKDF2 (Rfc2898DeriveBytes)

The reason you use the key derivation method is it protects you in case RngCryptoServiceProvider was found to be insecure for some reason which happens with random number generators.

Use AES 256 instead of AES 128, reason is these algorithms are extremely fast anyway so get the higher security it's almost free. Also make sure you're using the algorithm in CBC or CTR mode (CTR is available in the BouncyCastle library).

Now this will not give your key absolute protection if someone were able to put up a aspx file in your directory. Because that file will become part of your application it would have access to your decrypted values including your key. The reason I'm mentioning this is your network and server security will have to be top notch, so I would highly recommend you work hand-in-hand with your network security team to ensure that nobody has access to that box except the parties in the HR department that need access (Firewall not Active directory). Do NOT make this application publically accessible from the internet in any way shape or form.

You also cannot trust your HR department, someone could become a victim of a social engineering attack and end up giving away their login thus destroying your security model. So in addition to working with your network team you should integrate a two factor authentication mechanism to get into the system, highly recommend going with an actual RSA key or something similar rather than implementing TOTP. This way even if someone from the dept gives away their password because they thought they were winning a free ipad, the attacker would still need a physical device to get into the application.

Log Everything, any time someone sees a SSN make sure to log it somewhere that will be part of a permanent record that's archived on a regular basis. This will allow you to mitigate quickly. I would also put limits on how many records a person can see in a particular time frame, this way you know if someone is mining data from within your application.

Create a SQL user specifically to access this table, do not let any other user have access to the table. This will ensure that only with a particular user id and password can you view the table data.

Before deploying to a production environment you should hire a penetration testing team to test the application and see what they can get, this will go a long way to harden the application from potential attackers, and they can give you great advice on how to harden the security of the application.

Staceestacey answered 19/7, 2013 at 10:3 Comment(0)
R
0

Create a new salt and IV for each record. If you need to dump the data into a report for some reason (hopefully without my SSN in it), you would be able to use the method you describe with the unique salt and IV. If you only need to search on an SSN, you could actually hash it instead of using a reversible encryption (more secure).

Ramberg answered 18/7, 2013 at 20:29 Comment(3)
When it comes to SSNs though, isn't a hash more or less tantamount to reversible encryption because there's so little entropy in the data? If someone knew how the hash was performed (what the salt was) wouldn't it be trivial to hash all one billion possible SSNs and see which ones match?Reamer
What you are describing is called a rainbow table--a lookup of hashes. This is exactly why myself, and others are proposing hashing the values with unique salts. This makes the creation of this type of table very difficult. You would have to generate every possible hash for every record, which takes a long time. Simply encrypting or hashing with a common salt/key combination is very weak.Ramberg
From my understanding though, wouldn't using a unique salt per record destroy my ability to search (match) based on SSN later? Also, given the nature of social security numbers would you even need a rainbow table? Seems like an attacker could easily just compute every possible (or reasonable, given the constraints on SSNs) hash for each record -- unique salt or not.Reamer
J
0

I think I read somewhere once that hashing a limited set of inputs gets you absolutely nothing. A quick google turned up this SO post with similar warnings:

Hashing SSNs and other limited-domain information

I must admit that I am also no security expert, but given that the possible number of inputs is much smaller than 10^9 which any decent hacker should be able to breeze through in a matter of hours, hashing a SSN seems like you are adding a small layer of annoyance rather than an actual security/difficulty barrier.

Rather than doing it this way, could you do something else? For example, SSN's only have value to an attacker if they can associate a name to a number (since anyone can enumerate out all numbers easily enough). In that case, could you encrypt the user id that the SSN links to in such a way that would be impractical to attack? I am assuming your employees table has some sort of ID, but maybe instead of that do a hash on their email or some sort of guid? That way, even if they do get your SSN data, they would not be able to tell which employee's it is until they managed to brute force that link.

Then again, that approach is also flawed since your company may not have that many employees total. At that point it would be a relatively simple matter of guessing and checking against a company directory to attain everything. No matter how you slice it, this security flaw is going to exist if SSN's must be stored with other identifying data.

Jair answered 18/7, 2013 at 20:54 Comment(1)
Yes, I had pretty much come to the conclusion that a hacker could easily reverse a hash of a Social Security Number, given the salt value used to hash it. If you can manage to keep the salt value secret you can prevent this, but at that point you're almost doing defacto symmetric encryption in a bad way. Any way I slice it, the security of the whole system comes down to protecting a secret key.Reamer

© 2022 - 2024 — McMap. All rights reserved.