Given two file trees A and B, is it possible to determine the shortest sequence of operations or a short sequence of operations that is necessary in order to transform A to B?
An operation can be:
- Create a new, empty folder
- Create a new file with any contents
- Delete a file
- Delete an empty folder
- Rename a file
- Rename a folder
- Move a file inside another existing folder
- Move a folder inside another existing folder
A and B are identical when they will have the same files with the same contents (or same size same CRC) and same name, in the same folder structure.
This question has been puzzling me for some time. For the moment I have the following, basic idea:
- Compute a database:
- Store file names and their CRCs
- Then, find all folders with no subfolders, and compute a CRC from the CRCs of the files they contain, and a size from the total size of the files they contain
- Ascend the tree to make a CRC for each parent folder
- Use the following loop having database A and database B:
- Compute A ∩ B and remove this intersection from both databases.
- Use an inner join to find matching CRCs in A and B, folders first, order by size desc
- while there is a result, use the first result to make a folder or file move (possibly creating new folders if necessary), remove from both database the source rows of the result. If there was a move then update CRCs of new location's parent folders in db A.
- Then remove all files and folders referenced in database A and create those referenced in database B.
However I think that this is really a suboptimal way to do that. What could you give me as advice?
Thank you!