Software-design & architecture: How to sync data from a directory-tree with a database
Asked Answered
S

3

9

I´m twisting my head now for a while and don´t get to a final solution. So I hope I might find some exchange or help on how to solve this issue here on an architectural level.

I´m currently facing the following scenario: I want to write a web-application (I do it with Java, but this is not really relevant for a solution, as this is currently an question on a higher level), where there is this kind of relation:

Event --1:n--> Team --1:n--> Participant

Meaning: I have an event, in which will be a number of teams, having a number of participants. So far so good - this would be an easy relation in a SQL-database.

But then there is also a directory-tree, representing the same relation in a file-structure:

+--event1
|  +--team1
|  |  +--participant1
|  |  +--participant2
|  |  +--participant3
|  +--team2
|  |  +--participant4
|  +--team3
+--event2
|  +--team4
...

(I think, you got the idea) So in each participant´s directory are numerous files, which are copied to this directory via the file-system. Whenever there is a directory on the file-system, this should be connected to a corresponding entry in the database, where there is some additional data, that should be displayed together with the files in the web-GUI. It is not defined, what will be there at first (database-entry or directory) as this is operated by different users.

Now there are a couple of things to keep in mind, which make kind of sense to me:

  • When a directory-name changes (either event, team or participant), it should still relate to the same entry in the database (because there might be other entities, which still relate for example to a participant)
  • The directory of any event/team/participant might be deleted - the data in the database should then remain. BUT - if a new directory with the same name is created again at a later time and the event is 'closed', this directory will then point to a new database-entry (e.g. a new event). If the event is still active, then the creation of a directory with the same name should map to the previously assigned entry in the database.
  • Ideally a creation of a directory already leads to the creation of an corresponding database-entry.
  • It should also be possible to create an event/team/participant in the web-GUI, which then automatically creates a corresponding directory on the filesystem.

I hope my description is good enough to understand the scenario. I already have some things in mind, but all of them don´t really convince myself to be a robust solution. So hopefully one of you already have some idea on that. I´m pretty open to any technology or framework, which might help to solve this problem.

I´m looking forward to your ideas and a nice discussion!

Thanks for your help!

Sicular answered 4/9, 2017 at 19:15 Comment(0)
M
1

First of all the uniqueness of the directories must be designed. Did you consider to use hidden file, containing a unique key, inside of each watched directory? If there's no hi-load system, the creation time might be used.

Having the unique key in the file system it's no so hard to reflect the existing unique keys in the database and organize synchronization between two storages.

Mope answered 22/5, 2019 at 21:35 Comment(0)
P
1

The first principle i would look at is to have a "single source of thruth". Where is the name (the human-readable name) of the events/team/partecipants? into the database or into the filesystem?

The second principle: you wrote about "database entries" and "files" but these are just reprentations of the information of your domain. Design the data model first and then your data source can be organized to reflect that model

summing up, you can assign unique immutable ids to the entities in the domain model. Make names plain attributes of your entities and then implement your business rules as listed. You will implement you model as DS and as a file structure, you will access them through repositories that applies the same mutations over data keepeing in sync the minimal shared knowledge, like the ids

But i still have the doubt that you are using too many sources. Are you sure that you're not fine using just a DB or just a filsystem?

Put answered 22/5, 2019 at 21:45 Comment(0)
F
1

Use a hidden file with a name like .meta to contain some database information, at minimum the ID of the folder, and have a background process (daemon) that will scan the directory hierarchy every X seconds, compare what's there with what's in the database, and make the necessary adjustments. Stuff that gets deleted on the filesystem gets a "deleted" flag in the DB, stuff that's renamed has its name in the database changed, anything that needs to be added gets inserted, and additionally if a once-deleted folder is re-created, remove the "deleted" flag and re-create the subsidiary files in the directory.

Alternatively, if this is going to be an NFS drive or something like that, consider simulating the filesystem with a lightweight backend that translates delete, rename, and file creation operations into database commands instead. Then you only have one set of data you need to worry about the integrity of, and the web app and the file layout stay automatically in sync (no need for a daemon).

Fruitage answered 22/5, 2019 at 21:58 Comment(1)
Soft deletion is an interesting approach: it fits with the requirement because events data are not completely "forgotten" after removal. I see a design smell though, and is the fact that the application can mutate directly the file structure, while there would be a data source abstraction (a repository, in few words) that would enforce that every mutation should be kept consistent with DB data. the file system access API will be just HTTP/REST apis that ingest files under the "eventId" aggregatePut

© 2022 - 2024 — McMap. All rights reserved.