Minimal code to reliably store java object in a file
Asked Answered
C

5

8

In my tiny little standalone Java application I want to store information.

My requirements:

  • read and write java objects (I do not want to use SQL, and also querying is not required)
  • easy to use
  • easy to setup
  • minimal external dependencies

I therefore want to use jaxb to store all the information in a simple XML-file in the filesystem. My example application looks like this (copy all the code into a file called Application.java and compile, no additional requirements!):

@XmlRootElement
class DataStorage {
    String emailAddress;
    List<String> familyMembers;
    // List<Address> addresses;
}

public class Application {

    private static JAXBContext jc;
    private static File storageLocation = new File("data.xml");

    public static void main(String[] args) throws Exception {
        jc = JAXBContext.newInstance(DataStorage.class);

        DataStorage dataStorage = load();

        // the main application will be executed here

        // data manipulation like this:
        dataStorage.emailAddress = "[email protected]";
        dataStorage.familyMembers.add("Mike");

        save(dataStorage);
    }

    protected static DataStorage load() throws JAXBException {
        if (storageLocation.exists()) {
            StreamSource source = new StreamSource(storageLocation);
            return (DataStorage) jc.createUnmarshaller().unmarshal(source);
        }
        return new DataStorage();
    }

    protected static void save(DataStorage dataStorage) throws JAXBException {
        jc.createMarshaller().marshal(dataStorage, storageLocation);
    }
}

How can I overcome these downsides?

  • Starting the application multiple times could lead to inconsistencies: Several users could run the application on a network drive and experience concurrency issues
  • Aborting the write process might lead to corrupted data or loosing all data
Clam answered 18/2, 2016 at 10:43 Comment(2)
If you do not want the user to start multiple instances of your application you might consider instantiating a ServerSocket with a fixed port. If you start another instance it will throw an exception and in the catch clause you can just quit the second instance. However this approach might fail if another app uses the same port.Ptolemy
@Ptolemy If the application is stored in a shared folder, then two PCs can start the same application twice.Clam
L
2

To answer your three issues you mentioned:

Starting the application multiple times could lead to inconsistencies

Why would it lead to inconsistencies? If what you mean is multiple concurrent edit will lead to inconsistencies, you just have to lock the file before editing. The easiest way to create a lock file beside the file. Before starting edit, just check if a lock file exists.

If you want to make it more fault tolerant, you could also put a timeout on the file. e.g. a lock file is valid for 10 minutes. You could write a randomly generated uuid in the lockfile, and before saving, you could check if the uuid stil matches.

Several users could run the application on a network drive and experience concurrency issues

I think this is the same as number 1.

Aborting the write process might lead to corrupted data or loosing all data

This can be solved by making the write atomic or the file immutable. To make it atomic, instead of editing the file directly, just copy the file, and edit on the copy. After the copy is saved, just rename the files. But if you want to be on the safer side, you could always do things like append the timestamp on the file and never edit or delete a file. So every time an edit is made, you create a copy of it, with a newer timestamp appended on the file. And for reading, you will read always the newest one.

Lonnalonnard answered 29/2, 2016 at 21:58 Comment(0)
T
7

Seeing your requirements:

  • Starting the application multiple times
  • Several users could run the application on a network drive
  • Protection against data corruption

I believe that an XML based filesystem will not be sufficient. If you consider a proper relational database an overkill, you could still go for an H2 db. This is a super-lightweight db that would solve all these problems above (even if not perfectly, but surely much better than a handwritten XML db), and is still very easy to setup and maintain.

You can configure it to persist your changes to the disk, can be configured to run as a standalone server and accept multiple connections, or can run as part of your application in embedded-mode too.

Regarding the "How do you save the data" part:

In case you do not want to use any advanced ORM library (like Hibernate or any other JPA implementation) you can still use plain old JDBC. Or at least some Spring-JDBC, which is very lightweight and easy to use.

"What do you save"

H2 is a relational database. So whatever you save, it will end up in columns. But! If you really do not plan to query your data (neither apply migration scripts on it), saving your already XML-serialized objects is an option. You can easily define a table with an ID + a "data" varchar column, and save your xml there. There is no limit on data-length in H2DB.

Note: Saving XML in a relational database is generally not a good idea. I am only advising you to evaluate this option, because you seem confident that you only need a certain set of features from what an SQL implementation can provide.

Tilburg answered 18/2, 2016 at 11:17 Comment(3)
@slartidan, I expanded my answer to address to address your concerns. Are you considering to go this way?Tilburg
I think I prefer the other solutions - having to setup a (even extremly lightweight) database seems to be more overhead than using the other file locking mechanisms. But thanks a lot for your answer - it seems like this option is fitting other users' requirements best (it is currently the most upvoted answer).Clam
The initial overhead is surely bigger than just creating files. The final complexity of your application is the factor that decides in the end. But that is a decision you will take, and live with it. Most people (including me for sure) had tried and failed getting away "I will just save them in a few files". If you are lucky, it might work out for you. Sometimes it does.Tilburg
B
3

Inconsistencies and concurrency are handled in two ways:

  • by locking
  • by versioning

Corrupted writing can not be handled very well at application level. The file system shall support journaling, which tries to fix that up to some extent. You can do this also by

  • making your own journaling file (i.e. a short-lived separate file containing changes to be committed to the real data file).

All of these features are available even in the simplest relational database, e.g. H2, SQLite, and even a web page can use such features in HTML5. It is quite an overkill to reimplement these from scratch, and the proper implementation of the data storage layer will actually make your simple needs quite complicated.

But, just for the records:

Concurrency handling with locks

Consistency (atomicity) handling with locks

  • other application instances may still try to read the file, while one of the apps are writing it. This can cause inconsistency (aka dirty-read). Ensure that during writing, the writer process has an exclusive lock on the file. If it is not possible to gain an exclusive access lock, the writer has to wait a bit and retry.

  • an application reading the file shall read it (if it can gain access, no other instances do an exclusive lock), then close the file. If reading is not possible (because of other app locking), wait and retry.

  • still an external application (e.g. notepad) can change the xml. You may prefer an exclusive read-lock while reading the file.

Basic journaling

Here the idea is that if you may need to do a lot of writes, (or if you later on might want to rollback your writes) you don't want to touch the real file. Instead:

  • writes as changes go to a separate journaling file, created and locked by your app instance

  • your app instance does not lock the main file, it locks only the journaling file

  • once all the writes are good to go, your app opens the real file with exclusive write lock, and commits every change in the journaling file, then close the file.

As you can see, the solution with locks makes the file as a shared resource, which is protected by locks and only one applicaition can access to the file at a time. This solves the concurrency issues, but also makes the file access as a bottleneck. Therefore modern databases such as Oracle use versioning instead of locking. The versioning means that both the old and the new version of the file are available at the same time. Readers will be served by the old, most complete file. Once writing of the new version is finished, it is merged to the old version, and the new data is getting available at once. This is more tricky to implement, but since it allows reading all the time for all applications in parallel, it scales much better.

Bothersome answered 23/2, 2016 at 17:27 Comment(0)
M
2

note that your simple answer won't handle concurrent writes by different instances. if two instances make changes and save, simply picking the newest one will end up losing the changes from the other instance. as mentioned by other answers, you should probably try to use file locking for this.

a relatively simple solution:

  • use a separate lock file for writing "data.xml.lck". lock this when writing the file
  • as mentioned in my comment, write to a temp file first "data.xml.tmp", then rename to the final name when the write is complete "data.xml". this will give a reasonable assurance that anyone reading the file will get a complete file.
  • even with the file locking, you still have to handle the "merge" problem (one instance reads, another writes, then the first wants to write). in order to handle this you should have a version number in the file content. when an instance wants to write, it first acquires the lock. then it checks its local version number against the file version number. if it is out of date, it needs to merge what is in the file with the local changes. then it can write a new version.
Medeah answered 27/2, 2016 at 15:12 Comment(0)
L
2

To answer your three issues you mentioned:

Starting the application multiple times could lead to inconsistencies

Why would it lead to inconsistencies? If what you mean is multiple concurrent edit will lead to inconsistencies, you just have to lock the file before editing. The easiest way to create a lock file beside the file. Before starting edit, just check if a lock file exists.

If you want to make it more fault tolerant, you could also put a timeout on the file. e.g. a lock file is valid for 10 minutes. You could write a randomly generated uuid in the lockfile, and before saving, you could check if the uuid stil matches.

Several users could run the application on a network drive and experience concurrency issues

I think this is the same as number 1.

Aborting the write process might lead to corrupted data or loosing all data

This can be solved by making the write atomic or the file immutable. To make it atomic, instead of editing the file directly, just copy the file, and edit on the copy. After the copy is saved, just rename the files. But if you want to be on the safer side, you could always do things like append the timestamp on the file and never edit or delete a file. So every time an edit is made, you create a copy of it, with a newer timestamp appended on the file. And for reading, you will read always the newest one.

Lonnalonnard answered 29/2, 2016 at 21:58 Comment(0)
C
0

After thinking about it for a while, I would want to try to implement it like this:

  • Open the data.<timestamp>.xml-file with the latest timestamp.
  • Only use readonly mode.
  • Make changes.
  • Save the file as data.<timestamp>.xml - do not overwrite and check that no file with newer timestamp exists.
Clam answered 23/2, 2016 at 13:27 Comment(1)
a "cheap" way to handle the atomic write problem, is to write to a temp file, e.g. data.<timestamp>.xml.tmp and then rename the file once the write is complete. this gives you a pretty reasonable degree of confidence that any file that exists with the correct name has been completely written.Medeah

© 2022 - 2024 — McMap. All rights reserved.