Deterministic RSA encryption in Java
Asked Answered
B

3

12

This is my first question on this site, and I only have a basic mathematical understanding of RSA, so please bear with me! :)

I'm writing a Java web application for my final year project at university. It's a web-based implementation of "Pret-a-voter", a secure voting system, for those who have heard of it.

Essentially my problem is that I want to be able to give someone performing the role of an auditor:

  • a source byte array (the plaintext to be encrypted)
  • an RSA public key file
  • a "destination" byte array, which is the result of my own computation of the cipherdata given the plaintext and the public key

I then want the auditor to be able to perform encryption using the first two items, and be satisfied that the third is the result. I therefore need the encryption to be deterministic, i.e. generate the same cipherdata each time encryption with the same plaintext and public key are repeated.

(Note - I'm working with very small blocks of data in this project - there is no symmetric encryption involved at all... I'm aware this is an "interesting" use of RSA!)

Anyway I found that in Java, using

cipher = Cipher.getInstance("RSA");

uses the default random padding scheme, at a cost of 11 bytes (so with a 2048-bit key pair, it's possible to encrypt 2048/8-11 = 245 bytes). Repeated encryptions of the same plaintext generate different ciphertexts, which is obviously not the ECB mode that I want.

My question is - should I use the following?

cipher = Cipher.getInstance("RSA/ECB/NoPadding");

I've read in lots of places that RSA is insecure without padding. Is that simply because an attacker can build a dictionary of plaintexts/ciphertexts? This is a side-effect of the deterministic encryption I require in order to allow auditors to verify my encryption, and in my scheme auditors are trusted, so that would be OK.

Part two of my question is more java-related. If I do use RSA/ECB/NoPadding as above, I believe I'm able to provide a source byte array of (say) length 128 (for a 1024-bit RSA key pair) and encrypt that to get another byte array of length 128. If I then try to encrypt that again, with a different 1024-length public key, I get

javax.crypto.BadPaddingException: Message is larger than modulus

Does anyone know why?

EDIT - encryption with NoPadding doesn't always generate this exception - it's temperamental. However, even when encryption does not generate this exception, decryption generates this:

javax.crypto.BadPaddingException: Data must start with zero

Many thanks for reading through this! Any help would be greatly appreciated.

EDIT - sorry, my original question wasn't very clear about what it is I want to do, so here's an [attempt at an] explanation:

  • The plaintext is a voter's vote in an election.
  • Pret-a-voter aims to be end-to-end verifiable without sacrificing voter confidentiality (etc). After voting, the voter is provided with a receipt that they can use to verify that their vote has been recorded correctly, and which will later show them that their vote has not been tampered with. The voter performs a comparison of the information on their receipt with an identical copy posted on the web.
  • However, it should not be possible for any voter to prove how he/she voted (as that could lead to coercion) so the information is not the plaintext, but an encrypted copy of the vote.
  • In fact, the plaintext is encrypted four times, with four different asymmetric keys - held by two different tellers, each holding two of the keys. So, a vote (plaintext) is provided to one teller, who encrypts it using public key #1, and then encrypts THAT ciphertext with his second public key, gives THAT ciphertext to the second teller who encrypts it with his two keys in the same way. The resulting ciphertext (result of four sequential encryptions) is what is posted to the web (made public). The tellers are trusted.
  • Each encrypted vote can be visualised as an "onion" where the centre is the vote and there are several layers of encryption. In order to get to the vote, each layer must be removed in turn, meaning the corresponding private keys (held by the tellers) must be applied in the reverse sequence. This is key to the security - all tellers must work cooperatively in order to decrypt the votes.
  • The web bulletin board can be visualised as a table with 5 columns - the first (on the left) holds the fully-encrypted votes (also shown on each voter's receipt), and is the only visible column during the vote-casting stage. The second column contains the same set of votes, but with the outer layer removed - teller 2 populates this column and column 3 by decrypting the votes using its private keys during the tallying stage. At the end of the tallying stage, column 5 contains the fully-decrypted votes that can then be tallied.
  • Each voter gets a receipt that links them to an encrypted vote in column 1. This doesn't show how they voted, but allows them to verify that their vote has not been tampered with as throughout the election process they can verify that their encrypted vote is still there in column 1, untouched. This is only half of the "end-to-end verification", of course, since voters are unable to verify that the decryptions have been done correctly, i.e. that there's an entry in column 2 which is their vote minus the outer layer of encryption. Each voter is responsible only for the verification UP TO the point of column 1.
  • Thereafter, it is the auditors' responsibility to check that the entries in column 1 decrypt to column 2, and so on. The way they do this is by relying on deterministic encryption and the public keys used for the encryption being public.
  • Since public keys are public, you don't want people to simply draw lines from column 5 to column 1, joining up someone's vote as it becomes repeatedly encrypted - that way, a receipt that ties you to an encrypted vote actually ties you to an unencrypted, readable vote --> coercion! So, only columns 1, 3 and 5 are public (this is why each teller performs TWO encryptions), and for each entry in column 3, only ONE of the corresponding entries in {2,4} are revealed to auditors. This prevents anyone (even auditors) from linking an encrypted vote to an unencrypted vote.
  • Auditors therefore need to take an entry in column 3, be given the corresponding entry in column 2 and the public key, and perform the same encryption to verify that they do indeed get the entry in column 2.
  • Put together, this offers end-to-end verifiability.

Sorry that was so lengthy - I hope it describes my need for deterministic encryptions. I've missed out a lot of fundamental details (I've modified this scheme heavily) but hopefully the core principles are all there. Thank you so much for reading - I really appreciate it.

Bobbe answered 30/3, 2011 at 15:23 Comment(5)
I don't think RSA/ECB makes any sense at all as ECB is a chaining mode for a block cipher (or lack of chaining more to be more exact). Do you have a link to your protocol spec it may be you missed something.Farina
I learnt about the RSA/ECB/NoPadding getInstance parameter here: #2714827 Unfortunately the protocol specs are in my head and in the code I've written - I could post the code but it's a huge eclipse project and would probably take a long time to read and for me to explain. Is there any other way to encrypt deterministically in Java, besides the NoPadding option I mentioned (which may be insecure, I'm not sure, but it does seem to work - although sometimes generates "Message is larger than modulus" errors)?Bobbe
Is there perhaps a way I can seed the "random" padding - such that the ciphertext is constant for a given plaintext and given padding seed? I could then give that seed to the auditor.Bobbe
"Message larger than modulus" ain't true when it happens? You're not supposed to encrypt long things with RSA, most practical protocol encrypt only a random key for a symmetric cipher, so some implementation don't support RSA on long messages as it got little practical value... In your case you may need to reimplement RSA yourself, BigInteger was pretty serviceable last time I checked.Farina
BTW from the little I got of the protocol I don't see it being secure, maybe you should with each vote encrypt with the verifier keys enough information for them to verify it but predictible encryption doesn't sound right at all, you're going out of your way to remove security features...Farina
C
5

Removing the padding makes the system insecure. If the public keys are indeed public, as you say, then an attacker can simply go to column 5, take the plaintexts, and encrypt them with the 4 public keys in the proper sequence. They can then match up the resulting ciphertexts with that from the reciepts, compromising the "no coercion" property.

Random padding stops this, because the attacker doesn't know what padding to add.

You will need to use normal padding, but reveal a subset of the private keys to a subset of the auditors (usually called "scrutineers" in electoral systems). This means that one scrutineer can confirm that column 1 matches column 2, another can confirm that column 2 matches column 3, and so on. An individual scrutineer can't match a voter to a ballot, only co-operating ones.


The reason that you're getting the "Message is larger than modulus" error is because each modulus is different, so the ciphertext from one encryption may be outside the allowable range for the next encryption.

Coriecorilla answered 31/3, 2011 at 6:55 Comment(5)
Thanks very much for your thoughts. One key point I forgot to mention is that it is the "cyclic shift" of the candidate list (which is rotated by any integer modulo N (for N candidates) on each ballot) that is encrypted - the "X" position is stored in plain. However, there are also "seeds" that are added at each stage of encryption to prevent an attacker being able to encrypt with the four keys as you say - they don't know the seeds to use and therefore don't know how the cyclic shift should be reversed (by the value of the seed) at each stage.Bobbe
... given this, can I go back to no padding? Or am I still better off looking at revealing a subset of private keys to a subset of scrutineers? Also, unbelievable though it may sound, the way I'd done it so far for repeated encryption of 128-byte arrays (in java, using random padding at a cost of 11 bytes) is to append a 4-byte integer to the byte[], and move the first 15 bytes off to an "overflow" database field. When decrypting, these overflows are then prepended as appropriate. Not ideal at all, but it did work - the only time I encountered problems was when I found it wasn't deterministicBobbe
@Chris: Since the number of candidates N is typically very small, the unknown cyclic shift will not significantly increase the attacker's work - they just need to try all N possible shifts of the plaintexts.Coriecorilla
My idea was to have a (potentially) large cyclic shift, with a range of say 2^32, and use that modulo N as the actual shift, so hopefully that would be OK. However, thinking over the various different ways I might be able to save this project and the hours I've put in already without redesigning from scratch, I think your solution, to stick with random padding and give a private key or two to each auditor, is the one I'll use. I still wish it was possible to just give them the plaintext+PK and ask them to verify my ciphertext calculation, but not to worry. Thank you for your help :)Bobbe
@Chris: The problem is that only the value after the mod N is what matters - all the other possible values are equivalent to one of the 0...N-1 cyclic shift values. No problem. If there is someone in your faculty with a deep understanding of the RSA mathematics who can review your scheme, that would probably be a good idea.Coriecorilla
Q
3

https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Padding

The padding is there precisely to avoid a given plain text being encrypted to a single cyphertext. So if you want a deterministic (single) result for any given plain text your only option is to turn it off.

Quezada answered 30/3, 2011 at 16:3 Comment(2)
Thanks for this. Do you have experience of doing this in Java? If I choose to use RSA/ECB/NoPadding, am I forced to use input data of 2^N bytes? Do I just pad it with zeros? Sorry if that suggestion is ridiculous - I've read so much but am still at a loss for how to achieve what I need to. I've added some bullet points at the end of my question explaining what I need to do if that's any help. Thanks again!Bobbe
No, i dont have first hand experience of this. You can answer the zero padding question with a simple, empirical test. From your bullets, I can't help thinking a cryptographic hash might not be better in some cases. I'll read it again tomorrow, however.Quezada
K
1

So it seems to me that you have 2 main requirements that you are attempting to use deterministic RSA to solve:

  1. Allowing voters to ensure the integrity of their vote
  2. Allowing auditors to ensure the integrity of all votes

Digital Signatures should solve this problem. You can take your ciphertext from column 1, hash it, and encrypt the hash with a private key. That encrypted hash can then be placed in column 2. To verify the integrity of column 1, simply use the corresponding public key to decrypt the hash in column 2, hash column 1, and compare those 2 values. If they are equal, the data has not been tampered with. Only parties that have the private key could possibly tamper with the data in these columns, because only they can make a matching pair. This is similar to an HMAC, but has the advantage of using public/private keys rather than a secret shared key. Thus anybody can verify, but only trusted parties can modify.

One thing to note about deterministic schema is that it will leak information in many ways. Let's assume that I know I voted for Blue as my favorite color. I can see that the resulting ciphertext of my vote is 0x12345678. If the schema is completely deterministic, I know that anybody else that has a corresponding ciphertext of 0x12345678 also voted for Blue. Also, since you will typically have a finite set of vote choices, a chosen plaintext attack is incredibly easy. Thus you really want to let RSA do its job and use the intended padding scheme.

The next thing you may want to consider is protecting the system from a form of Replay Attack by numbering the votes or something like that. As I understand your schema, it looks like if I somehow got access to where you store your votes (or got in the middle of any communication), I could essentially spoof or spam fake votes just by replaying or copying data that I've already seen (another problem with being deterministic).

Keeshakeeshond answered 30/3, 2011 at 21:19 Comment(10)
Thank you very much (everyone) for reading and for your comments. I'm not sure it is an HMAC I need - I've added some bullet points at the bottom of the question to try and clarify why it is I want the RSA encryption to be deterministic. But I may be missing something or failing to see an alternative - if you have time to take a look and/or let me know if an HMAC could still work, I'd really appreciate it. Thanks.Bobbe
I gave it another shot. You're right, an HMAC didn't quite fit your needs, but I think Digital Signatures do. I think what you were attempting to do was close to a digital signature anyway. Most of all, I strongly urge you not to use RSA in the way that you are trying to use it. It will not be secure.Keeshakeeshond
Having read through everyone's answers/comments I think yours (encrypting hash with private key, and distributing the public key to auditors who can use it to decrypt and verify the integrity of the hashes) might be the best solution... I haven't heard of private key encryption / public key decryption before, didn't know it was possible with RSA - will need to read up on it. Thanks very much for the idea. The chosen plaintext attack wouldn't work as the encrypted information is in fact the cyclic shift of the candidate list, which is a large number (to be encrypted) modulo N candidates.Bobbe
I hope it works for you, I'm happy to answer any other questions you may have. I think that by using the cryptographic building blocks in the manner that they were intended, you'll find that you have everything you need to accomplish your task. The trick is figuring out what blocks to use in what places.Keeshakeeshond
Spent a while thinking about this and although I'm sure this is more or less the solution, it's still not completely clear to me how it would work. The aim is to convince the voters that the 4x encrypted votes in col 1 are in fact the votes in col 5, just encrypted and shuffled. Cols 1,3,5 are public. For each entry in col 3, an auditor is shown the link either to 1 or 5 (via 2 or 4 respectively; their choice, so everyone can be reasonably confident that all links are correct). By doing as you say - hashing column 1, then private key encrypting four times in cols 2-5, can't someone take the...Bobbe
... 4x encrypted hash in column 5 then decrypt with the 4 public keys to link a vote in column 5 to the encrypted vote in column 1 - by performing public key decryption for each key in sequence?Bobbe
Sounds like what you really want to sign is column 5 then, right? If column 5 is their plaintext vote, you want them to be sure that their actual vote was recorded properly (in a manner that can also be verified by auditors). Hash the plaintext of the vote, encrypt it with a private key and provide the public key for verification. Place that in your table. When I want to verify, I take my vote (I know what my vote was), hash it, and compare the hash with the hash from the table after I decrypt that with the public key.Keeshakeeshond
Ah, the reason only auditors can audit beyond column 1 is that it should not be possible for voters to show how they voted (coercion etc) so really the voter can only compare their receipt to something in column 1, with only the two tellers working cooperatively being able to generate 2,3,4 and eventually 5. Auditors are there to check 1-2-3 (i.e. 1 is 2 encrypted, which is 3 encrypted) XOR 3-4-5 for each value in 3, providing reasonable assurance that 1 decrypts to 5. This is why it'd be so easy if I could just give pub keys and tell auditors to encrypt 3 to 2 and 2 to 1 :) thanks so much.Bobbe
I'm sorry, I'm having a real difficult time getting my head around the requirements, the columns, and what they do. The best way to approach this problem is most likely to start out simple from the beginning with your core requirements, then design the system with simple building blocks to achieve each requirement. I think the problem is that you've gone too far down one path because you made assumptions that turned out to be incorrect (like using RSA deterministically). Or maybe you're over committed at this point and it's better just to force what you've got.Keeshakeeshond
You're definitely right about having made incorrect assumptions and gone too far down an incorrect path as a result... I suspect I'll need to force what I've got as you say, but I'm armed with an awful lot more knowledge now than I was before, so thanks very much for that :)Bobbe

© 2022 - 2024 — McMap. All rights reserved.