Looking for a simple standalone persistent dictionary implementation in C# [closed]
Asked Answered
G

13

28

For an open source project I am looking for a good, simple implementation of a Dictionary that is backed by a file. Meaning, if an application crashes or restarts the dictionary will keep its state. I would like it to update the underlying file every time the dictionary is touched. (Add a value or remove a value). A FileWatcher is not required but it could be useful.

class PersistentDictionary<T,V> : IDictionary<T,V>
{
    public PersistentDictionary(string filename)
    {

    } 
}

Requirements:

  • Open Source, with no dependency on native code (no sqlite)
  • Ideally a very short and simple implementation
  • When setting or clearing a value it should not re-write the entire underlying file, instead it should seek to the position in the file and update the value.

Similar Questions

Ghassan answered 19/9, 2008 at 7:25 Comment(0)
A
24
  • bplustreedotnet

    The bplusdotnet package is a library of cross compatible data structure implementations in C#, java, and Python which are useful for applications which need to store and retrieve persistent information. The bplusdotnet data structures make it easy to store string keys associated with values permanently.

  • ESENT Managed Interface

    Not 100% managed code but it's worth mentioning it as unmanaged library itself is already part of every windows XP/2003/Vista/7 box

    ESENT is an embeddable database storage engine (ISAM) which is part of Windows. It provides reliable, transacted, concurrent, high-performance data storage with row-level locking, write-ahead logging and snapshot isolation. This is a managed wrapper for the ESENT Win32 API.

  • Akavache

    *Akavache is an asynchronous, persistent key-value cache created for writing native desktop and mobile applications in C#. Think of it like memcached for desktop apps.

- The C5 Generic Collection Library

C5 provides functionality and data structures not provided by the standard .Net System.Collections.Generic namespace, such as persistent tree data structures, heap based priority queues, hash indexed array lists and linked lists, and events on collection changes.

Averill answered 27/1, 2009 at 11:0 Comment(4)
Thanks for the tip, added them to my list of useful libraries.Mastitis
@lubos From C5 docs: A persistent sorted collection implements interface IPersistentSorted<T> and is a sorted collection of which one can efficiently make a read-only snapshot, or a “copy” that is not affected by updates to the original collection. It describes a method Snapshot() that returns a sorted collection with exactly the same items as the persistent sorted collection. The TreeSet<T> and TreeBag<T> collection classes are persistent sorted collections; their Snapshot() methods take constant time, but sub- sequent updates to the given tree take more time and space - not what is requiredRestaurateur
C5 data structures are not persistent in that way. See en.wikipedia.org/wiki/Persistent_data_structureCheng
@lubos added akavache here, it seems like a good fit.Ghassan
J
13

Let me analyze this:

  1. Retrieve information by key
  2. Persistant storage
  3. Do not want to write back the whole file when 1 value changes
  4. Should survive crashes

I think you want a database.

Edit: I think you are searching for the wrong thing. Search for a database that fits your requirements. And change some of your requirements, because I think it will be difficult to meet them all.

Jarvis answered 27/1, 2009 at 13:2 Comment(2)
I understand what you are saying, but the database part is just another implementation detail, for a drop in solution that depends on sqlite you need to have code that handles x64 vs x32 you need a repository and a bunch of other things, so though, yes an embedded db may be a solution...Ghassan
... the devil is in the detail, and there is not detail or code in this answer. I explicitly asked for an implementation of IDictionaryGhassan
S
6

one way is to use the Extensible Storage Engine built into windoows to store your stuff. It's a native win database that supports indexing, transactions etc...

Suctorial answered 27/1, 2009 at 11:3 Comment(1)
ESE is good for this problem ... aynde has some wrappers for it.Ghassan
B
2

I wrote up an implementation myself based on a very similar (I think identical) requirement I had on another project a while ago. When I did it, one thing I realised was that most of the time you'll be doing writes, you only do a read rarely when the program crashes or when it's closed. So the idea is to make the writes as fast as possible. What I did was make a very simple class which would just write a log of all the operations (additions and deletions) to the dictionary as things occurred. So after a while you get a lot of repeating between keys. Because of that, once the object detects a certain amount of repetition, it'll clear the log and rewrite it so each key and its value only appears once.

Unfortunately, you can't subclass Dictionary because you can't override anything in it. This is my simple implementation, I haven't tested it though I'm sorry, I thought you might want the idea though. Feel free to use it and change it as much as you like.

class PersistentDictManager {
    const int SaveAllThreshold = 1000;

    PersistentDictManager(string logpath) {
        this.LogPath = logpath;
        this.mydictionary = new Dictionary<string, string>();
        this.LoadData();
    }

    public string LogPath { get; private set; }

    public string this[string key] {
        get{ return this.mydictionary[key]; }
        set{
            string existingvalue;
            if(!this.mydictionary.TryGetValue(key, out existingvalue)) { existingvalue = null; }
            if(string.Equals(value, existingvalue)) { return; }
            this[key] = value;

            // store in log
            if(existingvalue != null) { // was an update (not a create)
                if(this.IncrementSaveAll()) { return; } // because we're going to repeat a key the log
            }
            this.LogStore(key, value);
        }
    }

    public void Remove(string key) {
        if(!this.mydictionary.Remove(key)) { return; }
        if(this.IncrementSaveAll()) { return; } // because we're going to repeat a key in the log
        this.LogDelete(key);
    }

    private void CreateWriter() {
        if(this.writer == null) {
            this.writer = new BinaryWriter(File.Open(this.LogPath, FileMode.Open)); 
        }
    }

    private bool IncrementSaveAll() {
        ++this.saveallcount;
        if(this.saveallcount >= PersistentDictManager.SaveAllThreshold) {
            this.SaveAllData();
            return true;
        }
        else { return false; }
    }

    private void LoadData() {
        try{
            using(BinaryReader reader = new BinaryReader(File.Open(LogPath, FileMode.Open))) {
                while(reader.PeekChar() != -1) {
                    string key = reader.ReadString();
                    bool isdeleted = reader.ReadBoolean();
                    if(isdeleted) { this.mydictionary.Remove(key); }
                    else {
                        string value = reader.ReadString();
                        this.mydictionary[key] = value;
                    }
                }
            }
        }
        catch(FileNotFoundException) { }
    }

    private void LogDelete(string key) {
        this.CreateWriter();
        this.writer.Write(key);
        this.writer.Write(true); // yes, key was deleted
    }

    private void LogStore(string key, string value) {
        this.CreateWriter();
        this.writer.Write(key);
        this.writer.Write(false); // no, key was not deleted
        this.writer.Write(value);
    }

    private void SaveAllData() {
        if(this.writer != null) {
            this.writer.Close();
            this.writer = null;
        }
        using(BinaryWriter writer = new BinaryWriter(File.Open(this.LogPath, FileMode.Create))) {
            foreach(KeyValuePair<string, string> kv in this.mydictionary) {
                writer.Write(kv.Key);
                writer.Write(false); // is not deleted flag
                writer.Write(kv.Value);
            }
        }
    }

    private readonly Dictionary<string, string> mydictionary;
    private int saveallcount = 0;
    private BinaryWriter writer = null;
}
Boyceboycey answered 30/1, 2009 at 13:41 Comment(1)
If you wanted to improve it, you could make it detect idle periods and only flush the full file then. I know you wanted a system that would only update the relevant parts of the file, but not only would that be quite difficult to write, I'm sure this will be more than enough for your needs!Boyceboycey
C
2

I was working on porting EHCache to .NET. Take a look at the project

http://sourceforge.net/projects/thecache/

Persistent caching is core functionality that is already implemented. All main Unit Tests are passing. I got a bit stuck on distributed caching, but you do not need that part.

Cosmotron answered 31/1, 2009 at 8:3 Comment(1)
Theoretically this should work properly for the problem I have. ThanksGhassan
S
1

Sounds cool, but how will you get around changes to the stored value (if it was a reference type) itself? If its immutable then all is well but if not you're kinda stuffed :-)

If you're not dealing with immutable values, I would suspect a better approach would be to handle persistence at the value level and to just rebuild the dictionary as necessary.

(edited to add a clarification)

Stronghold answered 19/9, 2008 at 9:15 Comment(1)
Id be happy for only immutable data support (I could add an interface with an OnChanged for this kind of stuff, or perhaps proxy object which can get a bit yucky)Ghassan
R
1

I think your issue is likely to be that last point:

When setting or clearing a value it should not re-write the entire underlying file, instead it should seek to the position in the file and update the value.

This is exactly what a DB does - you're basically describing a simple file based table structure.

We can illustrate the problem by looking at strings.

Strings in memory are flexible things - you don't need to know the length of a string in C# when you declare its type.

In data storage strings and everything else are fixed sizes. Your saved dictionary on disk is just a collection of bytes, in order.

If you replace a value in the middle it either has to be exactly the same size or you will have to rewrite every byte that comes after it.

This is why most databases restrict text and blob fields to fixed sizes. New features like varchar(max)/varbinary(max) in Sql 2005+ are actually clever simplifications to the row only actually storing a pointer to the real data.

You can't use the fixed sizes with your example because it's generic - you don't know what type you're going to be storing so you can't pad the values out to a maximum size.

You could do:

class PersistantDictionary<T,V> : Dictionary<T,V>
    where V:struct

...as value types don't vary in storage size, although you would have to be careful with your implementation to save the right amount of storage for each type.

However your model wouldn't be very performant - if you look at how SQL server and Oracle deal with table changes they don't change the values like this. Instead they flag the old record as a ghost, and add a new record with the new value. Old ghosted records are cleaned up later when the DB is less busy.

I think you're trying to reinvent the wheel:

  • If you're dealing with large amounts of data then you really need to check out using a full-blown DB. MySql or SqlLite are both good, but you're not going to find a good, simple, open-source and lite implementation.

  • If you aren't dealing with loads of data then I'd go for whole file serialisation, and there are already plenty of good suggestions here on how to do that.

Rind answered 29/1, 2009 at 13:15 Comment(1)
MSSQL (and I'd assume other RDBMS) use a row offset table to track beginning of rows from a page.Myriam
U
0

Just use serialization. Look at the BinaryFormatter class.

Urge answered 19/9, 2008 at 9:18 Comment(3)
That will not give me the performance Im afterGhassan
You should edit the question to include your performance requirements.Hydroxyl
I did "When setting or clearing a value it should not re-write the entire underlying file, instead it should seek to the position in the file and update the value."Ghassan
B
0

I don't know of anything to solve your problem. It will need to be a fixed size structure, so that you can meet the requirements of being able to rewrite records without rewriting the entire file.

This means normal strings are out.

Batty answered 3/10, 2008 at 19:1 Comment(1)
Perhaps one can specify the max length in the ctor and have it throw when people add values that are greater?Griseldagriseldis
S
0

Like Douglas said, you need to know the fixed size of your types (both T and V). Also, variable-length instances in the object grid referenced by any of those instances are out.

Still, implementing a dictionary backed by a file is quite simple and you can use the BinaryWriter class to write the types to disk, after inheriting or encapsulating the Dictionary<TKey, TValue> class.

Saudra answered 6/10, 2008 at 23:1 Comment(7)
The whole point would be supporting dynamically sized types... SQL does it and uses a single file so I see no reason why this is not possibleGhassan
Use a database behind it then, instead of a file. I wouldn't go reinventing anything that's been done better before.Saudra
SQL uses caching and multi-threaded tasks to perform the way it does. To do that all for a single dictionary object is insane. You're requirements are unrealistic.Cellarage
"Reinventing" something like this is a great way to learn lots of neat stuff. His stated performance does not need anything like multiple threads or caching logic... beyond just keeping the whole thing in memory while running.Metallophone
After all, that's how Dictionary works and he just wants it to persist.Metallophone
Dynamic size isn't too bad either. Memory instances keep file position pointers, writing a larger value deletes and rewrites at the end... Destructor rewrites the file to compress deleted gaps. (Additionally a "gap count" could flag this to happen occasionally if the app stays up for a while)Metallophone
Add some critical sections for thread safety and you're done.Metallophone
B
0

Consider a memory mapped file. I'm not sure if there is direct support in .NET, but you could pinvoke the Win32 calls.

Bastian answered 27/1, 2009 at 11:4 Comment(0)
P
0

I'd recommend SQL Server Express or other database.

  • It's free.
  • It integrates very well with C#, including LINQ.
  • It's faster than a homemade solution.
  • It's more reliable than a homemade solution.
  • It's way more powerful than a simple disk-based data structure, so it'll be easy to do more in the future.
  • SQL is an industry standard, so other developers will understand your program more easily, and you'll have a skill that is useful in the future.
Pulsar answered 29/1, 2009 at 16:28 Comment(1)
SQL Express is not a drop in solution, its a messy complex installer, sql ce may be a solution ...Ghassan
F
-2

I am not much of a programmer, but wouldn't creating a really simple XML format to store your data do the trick?

<dico> 
   <dicEntry index="x">
     <key>MyKey</key>
     <val type="string">My val</val>
   </dicEntry>
   ...
</dico>

From there, you load the XML file DOM and fill up your dictionary as you like,

XmlDocument xdocDico = new XmlDocument();
string sXMLfile;
public loadDico(string sXMLfile, [other args...])
{
   xdocDico.load(sXMLfile);
   // Gather whatever you need and load it into your dico
}
public flushDicInXML(string sXMLfile, dictionary dicWhatever)
{
   // Dump the dic in the XML doc & save
}
public updateXMLDOM(index, key, value)
{
   // Update a specific value of the XML DOM based on index or key
}

Then whenever you want, you can update the DOM and save it on disk.

xdocDico.save(sXMLfile);

If you can afford to keep the DOM in memory performance-wise, it's pretty easy to deal with. Depending on your requirements, you may not even need the dictionary at all.

Finnish answered 27/1, 2009 at 12:55 Comment(2)
He didn't want to rewrite the entire file on updates.Steviestevy
Oups, looks like I wasn't paying enough attention.Finnish

© 2022 - 2024 — McMap. All rights reserved.