I have to unzip and then (after processing) again zip(archive) the source files. File sizes are huge typically around 200-250 GB (unzipped, .dat format)(total 96 files). The process of unzipping takes around 2 hours and again the zipping process takes 1:30 to 2 hours which is unaffordable. Currently I am using "zcat" command for unzipping and "gzip -3" for zipping the files. Disk space is not a issue as we have 1.5 Terabyte mount in place. Will you please suggest some more efficient modes of doing this process..
Looking forward to your suggestions, Thanks - Pushkar.
gzcat file.gz | ./fixingScript | gzip -9 - > file.tmp.gz && mv file.tmp.gz file.gz
? (Sorry, I don't have time to lookup the exact syntax you'd use withzip
utilities). This should essentially cut your processing time down to the longer of the two, unzip or re-zip. Or if this is something you can rearchitect, go for smaller files OR something that can be feed into a large parrallel processing system, Hadoop and many others. Good luck. – Magelcp file.zip file.orig.zip && unzip file.zip && load_to_informatica file && rm file && mv file.orig.zip file.zip
. So you're keeping a copy of your zipped file, unziping temporarily, and after unzipped file is loaded, you just delete it, and rename the saved copy of .zip back to file.zip. Good luck. – Magel