Object cache for C#
Asked Answered
F

9

23

I'm doing a document viewer for some document format. To make it easier, let's say this is a PDF viewer, a Desktop application. One requirement for the software is the speed in rendering. So, right now, I'm caching the image for the next pages while the user is scrolling through the document.

This works, the UI is very responsive and it seems like the application is able to render the pages almost instantly....at a cost : the memory usage sometimes goes to 600MB. I cache it all in memory.

Now, I can cache to disk, I know, but doing that all the time is noticeably slower. What I would like to do is implement some cache (LRU?), where some of the cached pages (image objects) are on memory and most of them are on disk.

Before I embark on this, is there something in the framework or some library out there that will do this for me? It seems a pretty common enough problem. (This is a desktop application, not ASP.NET)

Alternatively, do you have other ideas for this problem?

Forefinger answered 24/2, 2009 at 9:44 Comment(0)
F
21

I wrote an LRU Cache and some test cases, feel free to use it.

You can read through the source on my blog.

For the lazy (here it is minus the test cases):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LRUCache {
    public class IndexedLinkedList<T> {

        LinkedList<T> data = new LinkedList<T>();
        Dictionary<T, LinkedListNode<T>> index = new Dictionary<T, LinkedListNode<T>>();

        public void Add(T value) {
            index[value] = data.AddLast(value);
        }

        public void RemoveFirst() {
            index.Remove(data.First.Value);
            data.RemoveFirst();
        }

        public void Remove(T value) {
            LinkedListNode<T> node;
            if (index.TryGetValue(value, out node)) {
                data.Remove(node);
                index.Remove(value);
            }
        }

        public int Count {
            get {
                return data.Count;
            }
        }

        public void Clear() {
            data.Clear();
            index.Clear();
        }

        public T First {
            get {
                return data.First.Value;
            }
        }
    }
}

LRUCache

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LRUCache {
    public class LRUCache<TKey, TValue> : IDictionary<TKey, TValue> {

        object sync = new object();
        Dictionary<TKey, TValue> data;
        IndexedLinkedList<TKey> lruList = new IndexedLinkedList<TKey>();
        ICollection<KeyValuePair<TKey, TValue>> dataAsCollection;
        int capacity;

        public LRUCache(int capacity) {

            if (capacity <= 0) {
                throw new ArgumentException("capacity should always be bigger than 0");
            }

            data = new Dictionary<TKey, TValue>(capacity);
            dataAsCollection = data;
            this.capacity = capacity;
        }

        public void Add(TKey key, TValue value) {
            if (!ContainsKey(key)) {
                this[key] = value;
            } else {
                throw new ArgumentException("An attempt was made to insert a duplicate key in the cache.");
            }
        }

        public bool ContainsKey(TKey key) {
            return data.ContainsKey(key);
        }

        public ICollection<TKey> Keys {
            get {
                return data.Keys;
            }
        }

        public bool Remove(TKey key) {
            bool existed = data.Remove(key);
            lruList.Remove(key);
            return existed;
        }

        public bool TryGetValue(TKey key, out TValue value) {
            return data.TryGetValue(key, out value);
        }

        public ICollection<TValue> Values {
            get { return data.Values; }
        }

        public TValue this[TKey key] {
            get {
                var value = data[key];
                lruList.Remove(key);
                lruList.Add(key);
                return value;
            }
            set {
                data[key] = value;
                lruList.Remove(key);
                lruList.Add(key);

                if (data.Count > capacity) {
                    data.Remove(lruList.First);
                    lruList.RemoveFirst();
                }
            }
        }

        public void Add(KeyValuePair<TKey, TValue> item) {
            Add(item.Key, item.Value);
        }

        public void Clear() {
            data.Clear();
            lruList.Clear();
        }

        public bool Contains(KeyValuePair<TKey, TValue> item) {
            return dataAsCollection.Contains(item);
        }

        public void CopyTo(KeyValuePair<TKey, TValue>[] array, int arrayIndex) {
            dataAsCollection.CopyTo(array, arrayIndex);
        }

        public int Count {
            get { return data.Count; }
        }

        public bool IsReadOnly {
            get { return false; }
        }

        public bool Remove(KeyValuePair<TKey, TValue> item) {

            bool removed = dataAsCollection.Remove(item);
            if (removed) {
                lruList.Remove(item.Key);
            }
            return removed;
        }


        public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator() {
            return dataAsCollection.GetEnumerator();
        }


        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
            return ((System.Collections.IEnumerable)data).GetEnumerator();
        }

    }
}
Farris answered 26/2, 2009 at 6:4 Comment(12)
I fixed the link, but I recall there is a bug in the code, I will try to fix it.Farris
I updated the code, and wrote a few tests which did not find any flaws, not to say it has no bugsFarris
I think it would be better to put the removal of "least recently used" items in the add and remove methods, rather than in the indexer. I wouldn't want to take the performance hit every time I accessed something in my cache, because I don't care if the cache is "full" when I'm simply accessing. However, when I add, I definitely care whether the cache is full, and do indeed want to purge the least recently used item(s) at that time.Lexicostatistics
In your Set indexer on LRUCache I think you are removing twice: if (data.Count > capacity) { Remove(lruList.First); lruList.RemoveFirst(); } The call to remove removes the first element, and then you do an lruList.RemoveFirst() ... is this right?Importance
@Importance that is because it handles cache expiry afaikFarris
Right - but doesn't it remove the expired element twice? Once in the body of Remove, and then in lruList.RemoveFirst ... I'm getting a NullPointer exception in lruList.RemoveFirst for some reason (although I protect all access to the LRUList through locks). Thanks for the quick reply!Importance
To clarify, the Set indexer calls Remove, which does _lruList.Remove(key); where key is _lruList.First, and then the indexer does _lruList.RemoveFirst() ... unless you are intentionally removing two elements from _lruList?Importance
Thanks Sam - and many thanks for posting the code - much better than rolling my own.Importance
Is the latest code here or at samsaffron.com/archive/2009/08/21/… ?Pinguid
@Pinguid I think that is actually older than this, the code here is probably tested a bit betterFarris
this wont work. you are not handling capacity properly. also, your indexed linkedlist impl wont work as well. This wont work especially in a multi threaded environment.Goutweed
@Goutweed it is not meant to be thread safe, happy to fix any specific bugsFarris
P
13

For .NET 4.0, you can also use the MemoryCache from System.Runtime.Caching.

http://msdn.microsoft.com/en-us/library/system.runtime.caching.aspx

Picro answered 15/3, 2011 at 21:4 Comment(0)
J
8

There's patterns & practices Enterprise Library (more specifically, Caching Application Block), but it IMO tends to be over-engineered and overly complex.

Joceline answered 24/2, 2009 at 9:50 Comment(0)
Y
7

The .NET Framework has always had the ability to keep weak references to objects.

Basically, weak references are references to objects that the runtime considers "unimportant" and that may be removed by a garbage collection run at any point in time. This can be used, for example, to cache things, but you'd have no control over what gets colected and what not.

On the other hand, it's very simple to use and it may just be what you need.

Dave

Yip answered 24/2, 2009 at 10:3 Comment(7)
i read about this a long time ago and forgot about this, thanks!Forefinger
these kinds of solutions can backfire in this scenario. Imagine you are pre-loading images, then putting them in a hashtable of weak references. If no request for it comes in, and the GC runs, it will be deleted. Then when the user scrolls you need to regenerate it. This could all happen within secsDowser
Preloading and caching are not the same thing, and you should take care not to confuse them.Yip
you are right, but the original question sounds a lot more like a preloading scenario to me: "So, right now, I'm caching the image for the next pages while the user is scrolling through the document." Why would you want it to be collected before they scroll? And why keep them in memory after?Dowser
In that case, the Caching application block is also not going to be of much use. Someone should come up with a Preloading application block. ;)Yip
caching can be turned into preloading. the software accesses data the user will accesss in the future, or at least tries to predict soForefinger
Perhaps, but I don't think it's the same thing. The preloading of data, and the prediction of which data to load is very distinct from the decision which data to remove. They are inherently related: you can probably come up with a good algorithm to decide which data to preload, but you'd have no control over how long that data is kept.Yip
B
4

A classic trade-off situation. Keeping everything in memory will be fast at the cost of massively increased memory consumption, whilst retrieving from disc decreases memory consumption, but isn't as performant. However, you already know all this!

The built-in System.Web.Caching.Cache class is great, and I've used it to good effect many times myself in my ASP.NET applications (although mostly for database record caching), however, the drawback is that the cache will only run on one machine (typically a sole web server) and cannot be distributed across multiple machines.

If it's possible to "throw some hardware" at the problem, and it doesn't necessarily need to be expensive hardware, just boxes with plenty of memory, you could always go with a distributed caching solution. This will give you much more memory to play with whilst retaining (nearly) the same level of performance.

Some options for a distributed caching solution for .NET are:

Memcached.NET

indeXus.Net

or even Microsoft's own Velocity project.

Bohner answered 24/2, 2009 at 10:23 Comment(2)
So instead of caching it on disk, you'd do it over the network? Yup, that'll be a lot faster.Yip
Dave Van den Eynde: Caching on a local, gigabit network is a very common approach. It is not feasible in the case, however.Aucoin
E
3

How are you implementing your cache?

You can use the Cache class from System.Web.Caching, even in non-web applications, and it will purge items on an LRU basis if/when it needs the memory.

In a non-web application you'll need to use HttpRuntime.Cache to access the Cache instance.

Note that the documentation states that the Cache class isn't intended to be used outside of ASP.NET, although it's always worked for me. (I've never relied on it in any mission-critical app though.)

Effector answered 24/2, 2009 at 9:51 Comment(2)
oh wow, I just it was just for ASP.NET only. Will try this :) Right now it's just a straightforward hashtable.Forefinger
The Asp.Net cache is not intended to be used outside Asp.Net, so be sure to test your application properly. But it should work well of course :-)Amaral
D
0

Caching application block and ASP.NET cache are both options however, although they do LRU, the only kind of disk utilization that happens is by memory paging. I think there are ways you can optimize this that are more specific to your goal to get a better output. Here are some thoughts:

  • Since it's an ASP.NET app, why not generate the images, write them to disk, and when the browser requests the next page have IIS serve it up. That keeps your ASP.NET worker process lower while letting IIS do what it's good at.
  • Use statistics about how a user interacts. LRU as implemented in the ASP.NET cache will typically apply to the individual image files - not much use for this scenario by the sounds of it. Instead, maybe some logic like: "If the user has scrolled X pages in the last Y seconds, then general the next X*Y images". Users scrolling quickly will get more pages generated; users reading slowly will need less cached.
  • Have IIS serve images from disk, and use a custom HTTP handler for images you really want to control the caching of.
  • Have the browser request the files ahead of time too, and rely on browser caching.
  • Once an image has been served to the client, is it fair to say it can pretty much be removed from the cache? That could reduce the footprint substantially.

I'd certainly avoid using a plain hash table though.

Dowser answered 24/2, 2009 at 10:9 Comment(0)
R
0

To addition to  rsbarro's answer to use MemoryCache I recommend to use PostSharp AOP as described in 

http://www.codeproject.com/Articles/493971/Leveraging-MemoryCache-and-AOP-for-expensive-calls

Roshan answered 18/1, 2013 at 20:56 Comment(0)
D
0

There is an efficient, open sourced RAM virtualizer that uses MRU algorithm to keep freshest referenced objects in-memory and uses a fast, lightweight backing store (on Disk) for "paging".

Here is link in Code Project for a mini-article about it: http://www.codeproject.com/Tips/827339/Virtual-Cache

I hope you find it useful.

Deity answered 11/2, 2015 at 9:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.