I have to find SHA256 hashes of 2^25 random strings. And then look for collision (using birthday paradox for the last, say, 50 bits of the hash only).
I am storing the string:hash pair in a dict variable. Then sorting the variable with values (not keys) and then looking for collision using a O(n) loop.
The problem is that since there are 2^25 strings and their 2^25 hashes, so the dict variable has 2^50 values in it. This is EXTREMELY resource intensive. So, how do I do this with limited RAM, say, around 1GB?
What I have already tried:
1. Running this with a 6GB swap space. The program ran overnight and was still not done. This is essentially even slower than an O(n_square) search! The hashes are calculated with RAM usage of about 3.2 GB. But after that when it comes to the sort command, the RAM used starts shooting up again! I though the python sort uses In-Place-Quicksort :(
2. I have stored only the hashes and found a collision. But could not find the corresponding string since did not store it.
I am not supposed to use databases etc. At the most a text file but that doesn't help. Also, I am pretty new to Python but do not let that limit the level of your answer.
PS: this is an assignment. Some claim to have found the collisions in under one minute with 300MB RAM. Don't know if that is true. I have solved the problem but the answer is unreachable! At work so do not have access to code right now. Will add soon.
Code:
from Crypto.Hash import SHA256
import os
import random
import string
from operator import itemgetter
def shaa():
trun=[]
clist={}
for i in range(0,33554432):
sha=SHA256.new(str(i)).hexdigest()
sha=int(bin(int(sha,16))[-50:],2)
clist[i]=sha
print 'Hashes done.'
clist=sorted(clist.items(), key=itemgetter(1))
for i in range(0,33554432):
if(clist[i]==clist[i+1]):
#print string[i],string[i+1]
print clist[i]
return 1
return 2
result=2
while(result==2):
result=shaa()
anydbm
module? Strange requirement, though I think there are better solution anway. – Greenleelist.sort
uses Timsort. – Promethean