I am thinking of building a cloud base synch solution (something like Dropbox):
What would a robust architecture look like?
What technologies would you need to support different platforms like windows, mac, linux, and mobile devices?
What efficient synchronization algorithms would you use?
I know a naive architecture/solution would be:
Make a network call to your cloud storage and get the sync folder tree structure (just metadata info).
Have a filesystem monitor on the client to construct the local sync folder tree structure (I guess you would use something like lsyncd for the filesystem monitor?)
Retrieve the sync folder structure from the previous sync. Now you have 3 folder tree structures on the client. Using these 3 trees can determine what needs to be done on the local folder and what needs to be done on the remote folder on the server. For instance, add, delete, edit, conflict resolution, etc using some kind of pre-determined rules that is application specific.
This architecture might be sufficient but the devil is in the details. What if the sync folder tree is very large (that is very broad and very deep). Clearly an efficient algorithm to determine diffs would be needed. What if network connection dropped and you didn't get or send the entire tree properly? Also sending only file diffs to reduce network payload, etc..
I am aware these are things I have design for but my question is if this architecture is sufficient and if I should spend my time in the details? How is dropbox designed and what technologies and algorithms do they use to make synching of large folder structures and data size so efficient? Are there any resources/books I can consult on designing something like this?
Thanks in advance.