UPDATE: I solved the problem with a great external library - https://code.google.com/p/xdeltaencoder/. The way I did it is posted below as the accepted answer
Imagine I have two separate pcs who both have an identical byte[] A.
One of the pcs creates byte[] B, which is almost identical to byte[] A but is a 'newer' version.
For the second pc to update his copy of byte[] A into the latest version (byte[] B), I need to transmit the whole byte[] B to the second pc. If byte[] B is many GB's in size, this will take too long.
Is it possible to create a byte[] C that is the 'difference between' byte[] A and byte[] B? The requirements for byte[] C is that knowing byte[] A, it is possible to create byte[] B.
That way, I will only need to transmit byte[] C to the second PC, which in theory would be only a fraction of the size of byte[] B.
I am looking for a solution to this problem in Java.
Thankyou very much for any help you can provide :)
EDIT: The nature of the updates to the data in most circumstances is extra bytes being inserted into parts of the array. Ofcourse it is possible that some bytes will be changed or some bytes deleted. the byte[] itself represents a tree of the names of all the files/folders on a target pc. the byte[] is originally created by creating a tree of custom objects, marshalling them with JSON, and then compressing that data with a zip algorithm. I am struggling to create an algorithm that can intelligently create object c.
EDIT 2: Thankyou so much for all the help everyone here has given, and I am sorry for not being active for such a long time. I'm most probably going to try to get an external library to do the delta-encoding for me. A great part about this thread is that I now know what I want to achieve is called! I believe that when I find an appropriate solution I will post it and accept it so others can see as to how I solved my problem. Once again, thankyou very much for all your help.
A
andB
. The problem is figuring out how to map positionx
inC
toA
. That is, the delta (C
), would only be the length of the number bytes that have changed, but how do you then map that back to the correct position inA
. You could seed the position information intoC
, so thatx
would be the position andx+1
would be the actual data...for example. You would then need to consider run length encoding (so you would havex
= postion,x+1
is length, thenx+2+length
is the data...Or you could run it all through aZipStream
– Adelaadelaidabyte
not being big enough, didn't think about that...you could do some bit shifting (is that the right term) and use 4/8 bytes per position or some such...about here I'm thinking there has to be an easier way to do this... – Adelaadelaida