Is there a way to derive IEqualityComparer from IComparer?
Asked Answered
B

4

6

TL;DR I'm looking for a way to obtain IEqualityComparer<T> from IComparer<T>, no matter which datatype is T, including case-insensitive options if T is string. Or I need a different solution for this problem.

Here's full story: I'm implementing simple, generic cache with LFU policy. Requirement is that it must be possible to select whether the cache will be case sensitive or case insensitive -- if string happens to be the datatype for cache keys (which is not necessary). In the solution I primarily develop the cache for, I expect hundreds of billions of cache lookups, and cache sizes of max 100.000 entries. Because of that numbers I immediately resigned from using any string manipulation that causes allocations (such as .ToLower().GetHashCode() etc.) and instead opted to use IComparer and IEqualityComparer, as they are standard BCL features. User of this cache can pass the comparers to constructor. Here are relevant fragments of the code:

public class LFUCache<TKey,TValue>
{
    private readonly Dictionary<TKey,CacheItem> entries;
    private readonly SortedSet<CacheItem> lfuList;

    private class CacheItem
    {
        public TKey Key;
        public TValue Value;
        public int UseCount;
    }

    private class CacheItemComparer : IComparer<CacheItem>
    {
        private readonly IComparer<TKey> cacheKeyComparer;

        public CacheItemComparer(IComparer<TKey> cacheKeyComparer)
        {
            this.cacheKeyComparer = cacheKeyComparer;
            if (cacheKeyComparer == null)
                this.cacheKeyComparer = Comparer<TKey>.Default;
        }

        public int Compare(CacheItem x, CacheItem y)
        {
            int UseCount = x.UseCount - y.UseCount;
            if (UseCount != 0) return UseCount;
            return cacheKeyComparer.Compare(x.Key, y.Key);
        }
    }

    public LFUCache(int capacity, IEqualityComparer<TKey> keyEqualityComparer,
                  IComparer<TKey> keyComparer)  // <- here's my problem
    {
        // ...
        entries = new Dictionary<TKey, CacheItem>(keyEqualityComparer);
        lfuList = new SortedSet<CacheItem>(new CacheItemComparer(keyComparer));
    }
    // ...
}

The keyEqualityComparer is used to manage cache entries (so e.g. the key "ABC" and "abc" are equal if user wants to). The keyComparer is used to manage cache entries sorted by UseCount so that it's easy to select the least frequently used one (implemented in CacheItemComparer class).

Example correct usage with custom comparison:

var cache = new LFUCache<string, int>(10000,
    StringComparer.InvariantCultureIgnoreCase,
    StringComparer.InvariantCultureIgnoreCase);

(That looks stupid, but StringComparer implements both IComparer<string> and IEqualityComparer<string>.) The problem is that if user gives incompatible comparers (i.e. case insensitive keyEqualityComparer and case sensitive keyComparer), then the most likely outcome is invalid LFU statistics, and thus lower cache hits at best. The other scenario is also less than desired. Also if the key is more sophisticated (I'll have something resembling Tuple<string,DateTime,DateTime>), it's possible to mess it up more severely.

That's why I'd like to only have a single comparer argument in constructor, but that doesn't seem to work. I'm able to create IEqualityComparer<T>.Equals() with help of IComparer<T>.Compare(), but I'm stuck at IEqualityComparer<T>.GetHashCode() -- which is very important, as you know. If I had got access to private properties of the comparer to check if it's case sensitive or not, I would have used CompareInfo to get hash code.

I like this approach with 2 different data structures, because it gives me acceptable performance and controllable memory consumption -- on my laptop around 500.000 cache additions/sec with cache size 10.000 elements. Dictionary<TKey,TValue> is just used to find data in O(1), and SortedSet<CacheItem> inserts data in O(log n), find element to remove by calling lfuList.Min in O(log n), and find the entry to increment use count also in O(log n).

Any suggestions on how to solve this are welcome. I'll appreciate any ideas, including different designs.

Bustee answered 15/6, 2015 at 17:12 Comment(7)
One possibility is to use generic constraints to define a static factory method that takes a single comparer parameter that implements both IEqualityComparer<T> and IComparer<T>. Then at least you don't have pass in the same object to two different parameters.Staging
That sounds interesting, however somehow I can't manage to get an idea how the code should look like. Can you share a few rough lines of code? ;-)Bustee
I think it could worth to check whether it is possible to implement the SortedSet by yourself. If you increment UseCount there could only be a change of one positon upwards in this set. Maybe there is an implementation of O(1) for touch. Insert should be O(1) because it's added at the end of the list (maybe the SortedSet will do this). I'll try a new sample implementation.Buryat
@Verarind Yeah, the builtin SortedSet is a beast based on Red-Black Trees, it's very robust, and everything there is a O(log n). But as I noted below, LFU in my use-case works worse (which is counter-intuitive) than LRU. So I'll probably leave this a little bit crippled, but working, implementation somewhere, and do a LRU stuff with linked list as you suggested.Bustee
Yes but I think there is an implementation of LFU that only takes O(1). I'll make a try and post a new answer.Buryat
There's one that seems interesting - dhruvbird.com/lfu.pdfBustee
I already posted my code...Buryat
S
2

As I alluded to in my comment, you could add a helper method that might make things a little simpler for a basic use case:

public class LFUCache<TKey,TValue>
{
    public static LFUCache<TKey, TValue> Create<TComp>(int capacity, TComp comparer) where TComp : IEqualityComparer<TKey>, IComparer<TKey>
    {
        return new LFUCache<TKey, TValue>(capacity, comparer, comparer);
    }
}

and you'd use it like this:

var cache = LFUCache<string, int>.Create(10000, StringComparer.InvariantCultureIgnoreCase);
Staging answered 15/6, 2015 at 18:55 Comment(2)
Just because in this particular case the comparer happened to implement IEqualityComparer and IComparer doesn't mean that'll always be the case. For all of those other situations he'll need to create a new class to wrap the two comparers he has, and then that class will need to ensure that the two comparers share a single identity. This is, more or less, just pushing the problem elsewhere, not removing it.Octennial
@Octennial The OP asked for code related to my comment and it was a little too long for comment. I agree, its just making it simpler to what the OP was doing anyway. I would probably leave the constuctor with two comparers for exactly that use case. In general, there's not even a guarantee that GetHashCode and Equals are implemented consistently for IEqualityComparer<T> let alone consistent with some IComparer<T>. This is why we have code review and automated tests.Staging
O
6

It's not possible to implement an IComparer from an IEqualityComparer as you have no way of knowing whether an unequal item is greater than or less than the other item.

It's not possible to implement an IEqualityComparer from an IComparer as there's no way for you to generate a hash code that is in line with the IComparer's identity.

That said, there's no need for you to have both types of comparers in your case. When computing LRU you're comparing the time since an item was used as the primary comparer and then comparing based on a passed in comparer as a tiebreaker. Just remove that last part; don't have a tiebreaker. Let it be undefined which item leaves the cache when there is a tie for the least recently used. When you do that you only need to accept an IEqualityComparer, not an IComparer.

Octennial answered 15/6, 2015 at 17:43 Comment(18)
If you add the last access time as key you can't use a dictionary to find a key because you don't know when it was accessed the last time and therefore you can't calculate the key to lookup the dictionary. Maybe I don't understand your solution correctly (english is not my mother tongue - sorry).Buryat
@Verarind I'm not suggesting the OP change his solution. He apparently already has a working solution in which he has a side by side dictionary an sortedset, using the dictionary for lookups and the sorted set to determine what items to remove. The item in the sortedset has a key as a property, which can be used to find the item in the dictionary, if that's what you're asking.Octennial
Thanks very much. Yeah, maybe I'll evaluate once again LRU vs LFU benefits. LRU is easier to implement with just dictionary and doubly linked list, and have more O(1) properties. However I don't know how the change will impact cache hit ratio. I need some more work on it.Bustee
@Octennial I understand. He has a solution that needs an IComparer. You told him to change it and use a TimeStamp for the sorted set. Yes you don't ask him to change it completely. His solution does also need O(log n) to touch an element. Why don't change it completely to a solution that works in O(1) in every case and don't need any UseCount? He also sayed that different designs are welcome.Buryat
@Verarind No, all I told him to do was to not use the actual object's comparison for a tiebreaker of the item that should be dropped, and that he should just let the ties happen. If, with that change, his solution is fine, then great; nothing else to do here. If his entire solution, in addition to having this problem, is also too slow, then that's a completely different issue that's entirely separate from the question being asked. He seemed to feel the rest of the code was working fine, so if it is, I see no reason to re-write the whole thing.Octennial
Oh yes. I also missread the F in LFU. My answer was only for LRU. Sorry.Buryat
@Octennial I don't need ties for dropped object, I just need full comparison to be able to store all items in SortedSet.Bustee
Thanks for all your comments - it turned out that LRU works better. It's like "moving average" -- if one item was previously used very often, but now doesn't occur in the dataset, then with LRU it eventually will be expunged, and with LFU it could persist indefinitely. However I find this exercise very interesting, and your answers and comments invaluable.Bustee
@Bustee You can compare them entirely based on the use count, and not use a comparison of the keys as a tiebreaker for items accessed an equal number of times.Octennial
It is not entirely impossible to generate an IEqualityComparer from an IComparer since one can just return 0 as hashcode and it is valid.Dielu
@Dielu Such a comparer would have virtually no value though. You shouldn't be using a hash-based comparison if you don't actually have a meaningful hash; you'll be worse off over just not using a hash based data structure.Octennial
@Octennial Sure. Just telling u that it's not impossible as your answer alludes.Dielu
@Dielu It's functionally impossible. It's impossible to create one that could actually be used, and that wouldn't cause more harm than good.Octennial
@Octennial It's "always" functionally possible, but "sometimes/mostly" practically pointless. Not related to OP, but if u have a class Person with comparison based on Name property, then the hash for the comparer class can be Name.GetHashCode.Dielu
@Dielu But that's not accepting an arbitrary IComparer and creating an IEqualityComparer from it. That's a programmer creating an IEqualityComparer based on knowing the specific details of the IComparer implementation at compile time. That is something you can almost always do. Creating an IEqualityComparer from an IComparer in the general case is something that, for all practical purposes, you can't ever do.Octennial
@Octennial oh you're right, I missed the context. But my overall point stands. That it is not technically impossible to have a generic hash function for IEqualityComparer from IComparer, just that it is not practical. I thought your answer doesnt allude that.Dielu
@Dielu It's something that you would never ever want to do, as it would pretty much universally cause more harm than good to attempt to use that as a solution. I see no reason to advocate using such a solution in an answer. It's not like it actually solves the problem that the question is asking about; it would just make it much, much worse.Octennial
@Octennial There's a difference between saying "something is impossible" and "something is practically pointless". A wrong information is not the right way to encourage correct practice. I know mine is a minor nitpicking but just for correctness a better answer should sound something like "there is no meaningful way to get equality comparer from comparer" or so. And btw, I wasnt asking u to advocate a poor solution. I hope you see the difference.Dielu
S
2

As I alluded to in my comment, you could add a helper method that might make things a little simpler for a basic use case:

public class LFUCache<TKey,TValue>
{
    public static LFUCache<TKey, TValue> Create<TComp>(int capacity, TComp comparer) where TComp : IEqualityComparer<TKey>, IComparer<TKey>
    {
        return new LFUCache<TKey, TValue>(capacity, comparer, comparer);
    }
}

and you'd use it like this:

var cache = LFUCache<string, int>.Create(10000, StringComparer.InvariantCultureIgnoreCase);
Staging answered 15/6, 2015 at 18:55 Comment(2)
Just because in this particular case the comparer happened to implement IEqualityComparer and IComparer doesn't mean that'll always be the case. For all of those other situations he'll need to create a new class to wrap the two comparers he has, and then that class will need to ensure that the two comparers share a single identity. This is, more or less, just pushing the problem elsewhere, not removing it.Octennial
@Octennial The OP asked for code related to my comment and it was a little too long for comment. I agree, its just making it simpler to what the OP was doing anyway. I would probably leave the constuctor with two comparers for exactly that use case. In general, there's not even a guarantee that GetHashCode and Equals are implemented consistently for IEqualityComparer<T> let alone consistent with some IComparer<T>. This is why we have code review and automated tests.Staging
B
1

Okay next try. Here is an implementation for Add and Touch for LFU:

public class LfuCache<TKey, TValue>
{
    private readonly Dictionary<TKey, LfuItem> _items;

    private readonly int _limit;

    private LfuItem _first, _last;

    public LfuCache(int limit, IEqualityComparer<TKey> keyComparer = null)
    {
        this._limit = limit;
        this._items = new Dictionary<TKey,LfuItem>(keyComparer);
    }

    public void Add(TKey key, TValue value)
    {
        if (this._items.Count == this._limit)
        {
            this.RemoveLast();
        }

        var lfuItem = new LfuItem { Key = key, Value = value, Prev = this._last };
        this._items.Add(key, lfuItem);

        if (this._last != null)
        {
            this._last.Next = lfuItem;
            lfuItem.Prev = this._last;
        }

        this._last = lfuItem;

        if (this._first == null)
        {
            this._first = lfuItem;
        }
    }

    public TValue this[TKey key]
    {
        get
        {
            var lfuItem = this._items[key];
            ++lfuItem.UseCount;

            this.TryMoveUp(lfuItem);

            return lfuItem.Value;
        }
    }

    private void TryMoveUp(LfuItem lfuItem)
    {
        if (lfuItem.Prev == null || lfuItem.Prev.UseCount >= lfuItem.UseCount) // maybe > if you want LRU and LFU
        {
            return;
        }

        var prev = lfuItem.Prev;
        prev.Next = lfuItem.Next;
        lfuItem.Prev = prev.Prev;
        prev.Prev = lfuItem;

        if (lfuItem.Prev == null)
        {
            this._first = lfuItem;
        }
    }

    private void RemoveLast()
    {
        if (this._items.Remove(this._last.Key))
        {
            this._last = this._last.Prev;
            if (this._last != null)
            {
                this._last.Next = null;
            }
        }
    }

    private class LfuItem
    {
        public TKey Key { get; set; }

        public TValue Value { get; set; }

        public long UseCount { get; set; }

        public LfuItem Prev { get; set; }

        public LfuItem Next { get; set; }
    }
}

In my opinion it looks like that Add and Touch is in O(1), isn't it?

Currently I don't see any use case for _first but maybe anyone else need it. To remove an item _last should be enough.

EDIT A single linked list will also do if you don't need a MoveDown operation. EDIT No a single linked list will not work because MoveUp need the Next pointer to change it's Prev pointer.

Buryat answered 15/6, 2015 at 20:2 Comment(7)
Looks very good, however I can't see the code to enforce cache size limit (which is where things start to become tricky).Bustee
Good point. I updated Addand added RemoveLast. This should illustrate how it works.Buryat
Hm looking at TryMoveUp() and I think there must be a loop. For example if we have entries with use count 3, 2, 2, 2, 2, 2, 1 and the rightmost "2" is being touched so its use count becomes 3, it should move 4 times to the left. Otherwise the list will not have most frequently used items close to the top. So this is the only place where we will have O(n).Bustee
Yeah. I realized that some minutes ago. There is an error in my assumption. There could be more than 1 MoveUp! If all n entries have the same UseCount the last item has to move up to the first position that is an O(n) operation. Sorry. This implementation has a best case of O(1) but a worst case of O(n). Don't know whether it will be better to use a SortedSet that has O(log n) in worst case or the document you found. Sorry for that.Buryat
I think the best approach is the one from the paper I linked. No matter if we make a perfect solution now or not, this case IMO was very interesting. I learned that LFU cache has its own problems, and that it's challenging to be implemented properly. For my current use case I'm reverting to LRU. Thank you for your help on this.Bustee
I read the paper you linked. Very interesting. The base idea might be the same that I had. They're using two doubly linked lists - one for the UseCount and one for the items of the same UseCount. Using this approach there is only one "MoveUp" in the UseCount-list, one remove operation of the old list and one add operation in the new list. Sounds very simple and straight forward. Very good paper. I gave a credit for finding it. Maybe sometimes I'll need it (and hopefully I'll remember this thread).Buryat
Thanks, I added the paper to my favorites in Algorithms/Caching :-)Bustee
T
1

Instead of taking an IEqualityComparer and an IComparer in your constructor, you could try taking an IComparer and a lambda which defines GetHashCode(). Then build an IEqualityComparer based on if(IComparer==0) and GetHashCode() = lambda.

Although I would say it is small, you still have the risk of getting HashCode mismatches when IComparer returns 0. If you want to make it super clear to the user of your code, you could always extend the strategy by taking two lambdas in the constructor: Func<T,T,int> used for both IComparer and IEqualityComparer, and Func<T,int> for GetHashCode.

Touristy answered 5/5, 2020 at 17:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.