Is there a more efficient implementation for a bidirectional map?
Asked Answered
L

6

85

I created a simple bidirectional map class that works by internally storing two std::map instances, with opposite key/value types, and providing a user-friendly interface:

template<class T1, class T2> class Bimap
{
    std::map<T1, T2> map1;
    std::map<T2, T1> map2;
    // ...
};
  • Is there a more efficient method of implementing a bidirectional map that doesn't require twice the memory?

  • How is a bimap usually implemented?


EDIT:

  • Should bimap element be mutable or immutable? (Changing one element in map1 should change the key in map2, but keys are const and that's impossible - what's the solution?)

  • Ownership of elements is also another problem: when a user inserts a key-value pair in the bimap, the bimap should make a copy of that key-value pair and store it, then the internal second map (with inverted key/value) should not copy but point to the original pair. How can this be achieved?


EDIT 2:

I've posted a possible implementation I made on Code Review.

Lagan answered 13/2, 2014 at 16:48 Comment(5)
"Should bimap element be mutable or immutable?" Well, do you need it mutable or not? Everything depends on your requirements. What do you need that bimap for? If it's an exercise for exercise's sake, then you're free to assume things either way - better yet, try it both ways!Linoleum
@KubaOber: In all honesty I did check the source code for boost::bimap beforehand, but I didn't find it particularly helpful as it is heavily templatized and the interesting details about the bimap implementation are not really exposed. It's certainly not easy to read and also does not make use of C++11 features (which may potentially help implementing a bimap from scratch). Maybe I should have mentioned it in the first post. Also, I did look on Google for "bidirection map implementation", "bimap tutorial" and so on, but couldn't find a good article.Lagan
@KubaOber: The mutability/immutability problem surfaced after I tested using an std::map<T2, const T1*> as the second map. The issue is that, if I use a T2 that is supposed to be user-mutable, it becomes somewhat hard to modify that T2 instance both in the first and in the second map. It would probably be required to erase the key/value pair from the second map and insert a new one with the modified key. Then I realized that maybe the internal representation of the bimap should depend on whether the key/value pairs should be mutable or not? Maybe I should check if the types are POD?Lagan
If the types are POD, I could just use the current implementation, and maybe check via templates if the sizeof is big, and if so use the "const pointer" second map implementation. I opened this question because, after looking on the web and trying various things, I still have a lot of doubts - I believe there must be some kind of data structure that allows a bidirectional key/value interaction while somewhat maintaining the efficiency of a normal map.Lagan
@VittorioRomeo Well there was the proposal of making a vector of elements + 2 maps of pointers, and then mine to have a set of pairs. Your solution has a vector of pairs + 2 sets of pointers. Great combination skills dude :)Acadian
G
30

There is a certain problem with double-storing your data in all simple implementations of a bimap. If you can break it down to a bimap of pointers from outside, then you can readily ignore this and simply keep both maps of the form std::map<A*,B*> like Arkaitz Jimenez already suggested (though contrary to his answer you have to care about the storage from outside to avoid a A->A* lookup). But if you have the pointers anyway, why not simply store a std::pair<A,B> at the point where you would otherwise store A and B separately?

It would be nice to have std::map<A,B*> instead of std::map<A*,B*> as this would allow for example the lookup of an element associated to an string by a newly created string with the same content instead of the pointer to the original string that created the pair. But it is customary to store a full copy of the key with every entry and only rely on the hash to find the right bucket. This way the returned item will be the correct one even in the case of a hash-collision...

If you want to have it quick and dirty though, there is this

hackish solution:

Create two maps std::map<size_t, A> mapA and std::map<size_t, B> mapB. Upon insertion hash both elements that are to be inserted to get the keys to the respective maps.

void insert(const A &a, const B &b) {
    size_t hashA = std::hash<A>(a);
    size_t hashB = std::hash<B>(b);

    mapA.insert({hashB, a});
    mapB.insert({hashA, b});
}

Lookup is implemented analogously.

Using a multimap instead of a map and verifying every element you get with a lookup in the respectively other map (get candidate b from mapA, hash b and look in mapB if it matches the wanted key, iterate to the next candidate b otherwise) this is a valid implementation - but still hackish in my opinion...

You can get a much nicer solution by using the copies of the elements that are used to compare the entries (see above) as only storage. It is a bit harder to get your head around that though. To elaborate:

a nicer solution:

Create two sets of pairs as std::set<pair<A, B*>> and std::set<pair<B, A*>> and overload the operator< and operator== to only take the first element of the pairs into account (or provide an corresponding comparion class). It is necessary to create sets of pairs instead of maps (which internally look similarly) because we need a guarantee that A and B will be at constant positions in memory. Upon insertion of an pair<A, B> we split it into two elements that fit into the above sets.

  std::set<pair<B, A*>> mapA;
  std::set<pair<A, B*>> mapB;

  void insert(const A &a, const B &b) {
      auto aitr = mapA.insert({b, nullptr}).first; // creates first pair
      B *bp = &(aitr->first);  // get pointer of our stored copy of b
      auto bitr = mapB.insert({a, bp}).first; 
      // insert second pair {a, pointer_to_b}
      A *ap = &(bitr->first);  // update pointer in mapA to point to a
      aitr->second = ap;
  }

Lookup can now simply be done by a simple std::set lookup and a pointer dereference.

This nicer solution is similar to the solution that boost uses - even though they use some annonymized pointers as second elements of the pairs and thus have to use reinterpret_casts.

Note that the .second part of the pairs need to be mutable (so I'm not sure std::pair can be used), or you have to add another layer of abstraction (std::set<pair<B, A**>> mapA) even for this simple insertion. In both solutions you need temporary elements to return non-const references to elements.

Guillerminaguillermo answered 20/2, 2014 at 18:57 Comment(9)
Obviously the "nice solution" still is only the idea of the actual implementation. There are a few subtle points regarding mutability. The hackish solution should be fairly straight forward - though you need some tricks (temporary elements) even with that solution to be able to return non-const references to elements.Guillerminaguillermo
I awarded this answer the bounty because it shows concrete examples of a possible bimap implementation.Lagan
How is using two sets of pairs better than using two maps?Kristof
@Kristof with maps you don't have access to where the keys are stored (and are thus not guaranteed that they will always be at the same locations in memory)Guillerminaguillermo
This was helpful, thanks - could you identify where those quotes are from?Stellarator
@Guillerminaguillermo You do have access to the address where keys are stored - just use the & dereference operator. Also, std::set<Key, Value> and std::map<Key, Value> store std::pair<Key, Value> data item inside each node. The address of the nodes are invariant through the lifetime of the containers. These together give you the guarantee of invariant memory address of .first and .second parts, so since std::map allows to mutate the mapped value, it would be a perfect and hassle-free container for the proposed idea.Carib
@Carib I tried the same with map and it's much easier but how do you get the address of a const key of a map to store it in the value of the 2nd?Alleged
@lucieon You insert anstd::pair<key_type, mapped_type> where second is default-constructed like auto it = map.insert({ key, mapped_type{} }).first, then you assign the address of std::pair::first to std::pair::second as it->second = &it->first. Also, I use std::reference_wrapper<const key_type> for the mapped_type, since it provides a nice interface to use the address as an actual reference to the value (there is no need to dereference after lookup).Carib
@example, in the "nicer solution", is your lookup iterating through the set?Noland
B
19

It would be more efficient to store all elements in a vector and have 2 maps of <T1*,T2*> and <T2*,T1*> that way you would not have everything copied twice.

The way I see it you are trying to store 2 things, elements themselves and the relationship between them, if you are aiming to scalar types you could leave it as is 2 maps, but if you aim to treat complex types it makes more sense to separate the storage from the relationships, and handle relationships outside the storage.

Boothe answered 13/2, 2014 at 16:52 Comment(12)
Pointers to vector elements would make insertion/deletion pretty rough. I'd probably use lists for the backing store and iterators for the 2 maps.Spaniel
Or even, one map of elements, and one of pointers. The nodes themselves don't get moved when the tree is rebalanced.Landscapist
Interesting. However I should use a std::vector<std::unique_ptr<T>>, since pointers cannot be invalidated and some classes may be non-copyable. I suppose accessing a member would be less efficient (but memory usage would be halved).Lagan
Casey is right in that if you expect frequent insertions/deletions you better use a backing storage that is good with that, like a list rather than a vector, if it is immutable a vector would be a good choice.Boothe
If you go with vector depending on alignment of the individual types versus the combined types it also could be a good choice to use two vectors instead of one.Perm
don't forget that the dual vector of pointer would need to be keept in sync if your data change... like deleting from the first vector when you delete from the second and so on.Vermiculite
@BoBTFish: I suppose I can simply point to the elements inside of the first map, since interators/pointers shouldn't be invalidated by element addition/removal on std::map and std::unordered_map, correct?Lagan
@VittorioRomeo Correct. (As long as you aren't removing the elements you are pointing at!) But it's a bit fiddly. Last time I needed this I just wrapped two maps like you did at the top. Project is a small test client, doesn't have boost.Landscapist
@BoBTFish: I've tried implementing it, but it's much harder than it sounds. I'm having a lot of trouble with const-correctness and also with managing the lifetime of the objects. Seems std::map's key type must be const, but since the key type of the second map is the value type of the first one, it's expected to be mutable.Lagan
Use indices instead of pointers and swap-and-pop instead of erase and almost the issues with vector as a backing store go away. Use deque (or a reasonable reimplementation, given how terrible some of the compiler-provided ones are) and the rest of the problems with the backing store go away. You could even use the vector/deque as the backing store for your map nodes themselves with some work, which can have a nice boost to memory performance.County
why not just make both maps contain std::shared_ptr's?Tarsometatarsus
Lookup will not be O(log n) unless you already know the pointer that was used in the map<A*,B*> from the outside - in which case you could just as well have used std::pair<A,B> and get rid of all lookups. Iterating over the vectors to find the pointers matching the requested keys are not very map-like...Guillerminaguillermo
C
15

Boost Bimap makes use of Boost Mutant Idiom.

From the linked wikipedia page:

Boost mutant idiom makes use of reinterpret_cast and depends heavily on assumption that the memory layouts of two different structures with identical data members (types and order) are interchangeable. Although the C++ standard does not guarantee this property, virtually all the compilers satisfy it.

template <class Pair>
struct Reverse
{
    typedef typename Pair::first_type  second_type;
    typedef typename Pair::second_type first_type;
    second_type second;
    first_type first;
};

template <class Pair>
Reverse<Pair> & mutate(Pair & p)
{
  return reinterpret_cast<Reverse<Pair> &>(p);
}

int main(void)
{
  std::pair<double, int> p(1.34, 5);

  std::cout << "p.first = " << p.first << ", p.second = "  << p.second << std::endl;
  std::cout << "mutate(p).first = " << mutate(p).first << ", mutate(p).second = "  << mutate(p).second << std::endl;
}

The implementation in boost sources is of course fairly hairier.

Chavez answered 13/2, 2014 at 17:20 Comment(5)
I don't get how this would help: do I store a std::map<Pair> and std::map<ReversePair> in my bimap class? I understand what's happening with the reinterpret_cast, but I don't understand how this can be used for a bidirectional map.Lagan
Can someone elaborate on this answer? It looks interesting, and I'm curious.Lagan
@VittorioRomeo: This might shed some light: boost.org/doc/libs/1_55_0/libs/bimap/doc/html/boost_bimap/… (see the final paragraph).Kristof
@Kristof I understand the theory behind this - I'm trying to have a std::set<T1*> and std::set<T2*>, and I can retrieve T2 items from T1, but I can't find a way to use the mutation for retrieve T1 items from T2. Here's my progress: pastie.org/8752050 How should I use the reverse pair idiom here?Lagan
It is true that boost uses the Mutate idiom to hide the type of the underlying objects - but that is pretty much all that the mutate idiom has to do with the bimap. The algorithmic relevant part is the pairing of the element a with a pointer to element b and vice versa and storing them in respective sets (see my answer).Guillerminaguillermo
A
9

If you create a set of pairs to your types std::set<std::pair<X,Y>> you pretty much have your functionallity implemented and rules about mutabillity and constness preset (OK maybe the settings aren't what you want but tweaks can be made). So here is the code :

#ifndef MYBIMAP_HPP
#define MYBIMAP_HPP

#include <set>
#include <utility>
#include <algorithm>

using std::make_pair;

template<typename X, typename Y, typename Xless = std::less<X>, 
                     typename Yless = std::less<Y>>
class bimap
{
    typedef std::pair<X, Y>                             key_type;
    typedef std::pair<X, Y>                             value_type;
    typedef typename std::set<key_type>::iterator       iterator;
    typedef typename std::set<key_type>::const_iterator const_iterator;

    struct Xcomp
    {
        bool operator()(X const &x1, X const &x2) const
        {
            return !Xless()(x1, x2) && !Xless()(x2, x1);
        }
    };
    struct Ycomp
    {
        bool operator()(Y const &y1, Y const &y2) const
        {
            return !Yless()(y1, y2) && !Yless()(y2, y1);
        }
    };
    struct Fless 
    { // prevents lexicographical comparison for std::pair, so that 
        // every .first value is unique as if it was in its own map
        bool operator()(key_type const &lhs, key_type const &rhs) const
        {
            return Xless()(lhs.first, rhs.first);
        }
    };
    /// key and value type are interchangeable
    std::set<std::pair<X, Y>, Fless> _data;

public:
    std::pair<iterator, bool> insert(X const &x, Y const &y)
    {
        auto it = find_right(y);
        if (it == end()) { // every .second value is unique
            return _data.insert(make_pair(x, y));
        }
        return make_pair(it, false);
    }
    iterator find_left(X const &val)
    {
        return _data.find(make_pair(val,Y()));
    }
    iterator find_right(Y const &val)
    {
        return std::find_if(_data.begin(), _data.end(), 
            [&val](key_type const &kt)
        {
            return Ycomp()(kt.second, val);
        });
    }
    iterator end()   { return _data.end();   }
    iterator begin() { return _data.begin(); }
};

#endif

Example usage

template<typename X, typename Y, typename In>
void PrintBimapInsertion(X const &x, Y const &y, In const &in)
{
    if (in.second) {
        std::cout << "Inserted element (" 
              << in.first->first << ", " << in.first->second << ")\n";
    }
    else {
        std::cout << "Could not insert (" << x << ", " << y 
                      << ") because (" <<  in.first->first << ", " 
                      << in.first->second << ") already exists\n";
    }
}


int _tmain(int argc, _TCHAR* argv[])
{
    bimap<std::string, int> mb;
    PrintBimapInsertion("A", 1, mb.insert("A", 1) );
    PrintBimapInsertion("A", 2, mb.insert("A", 2) );
    PrintBimapInsertion("b", 2, mb.insert("b", 2));
    PrintBimapInsertion("z", 2, mb.insert("z", 2));

    auto it1 = mb.find_left("A");
    if (it1 != mb.end()) {
        std::cout << std::endl << it1->first << ", " 
                      << it1->second << std::endl;
    }

    auto it2 = mb.find_right(2);
    if (it2 != mb.end()) {
        std::cout << std::endl << it2->first << ", " 
                      << it2->second << std::endl;
    }

    return 0;
}

NOTE: All this is a rough code sketching of what a full implementation would be and even after polishing and extending the code I'm not implying that this would be an alternative to boost::bimap but merely a homemade way of having an associative container searchable by both the value and the key.

Live example

Acadian answered 16/2, 2014 at 12:58 Comment(9)
This seems inefficient: std::set is slow to iterate and insertion/lookup require O(n) in the worst case - the benefit about using maps is the fast lookup/insertion time even with a lot of elementsLagan
Both set and map have the same (logarithmic) complexity of such operations. Maybe you're confusing them with unordered containers (hash tables). As Josuttis says: You can consider a set as special kind of map where the the value is identical to the key. In fact all these associative container types are usually implemented by using the same basic implementation of a binary treeAcadian
Yes, I apologize, I was confused. Considering the fact that the complexity of std::set is similar to std::map, this seems like a decent solution. Is there any obvious drawback? (Any performance hit in the lookup, for example, compared to my naive implementation?)Lagan
Well the Tree is constructed with the comparison operator for the right_key X so maybe lookup for Y isn't optimal, but I don't know the internals enough to even guess (maybe there are no worries there). In any case I hope to make up for that by providing only one _data to worry about (update, insert into, delete from, check etc etc) instead of two that would force more complicated code (more complicated bugs), more memory usage and duplicate actionsAcadian
The implementation of find seems inefficient as std::find_if has linear complexity (en.cppreference.com/w/cpp/algorithm/find) so using std::set here is kind of pointless (except for fast insertion).Kristof
@Kristof That was not my intention, please review the change in find_left (there never was a need to use std::find there)Acadian
@VittorioRomeo The drawback is the lookup. log time lookup can be implemented for X, but linear time lookup needs to be performed for Y.Sightly
Xcomp::operator(), Ycomp::operator() and Fless::operator() should be marked const. The code didn't compile for me without itAriannaarianne
@Ariannaarianne thanks, I've updated the code. Interestingly this used to compile .. maybe compiler checks got stricter since 2014. I think it's because of the creation of a X/Ycomp() temporaries in the comparison functions. They bind to const T& so invocation requires them to be const methods. Probably it would also work if we specify the method as && and make it available to x-values.Acadian
P
6

One possible implementation that uses an intermediate data structure and an indirection is:

int sz;  // total elements in the bimap

std::map<A, int> mapA;
std::map<B, int> mapB;

typedef typename std::map<A, int>::iterator iterA;
typedef typename std::map<B, int>::iterator iterB;

std::vector<pair<iterA, iterB>> register;  
// All the operations on bimap are indirected through it.

Insertion

Suppose you have to insert (X, Y) where X, Y are instances of A and B respectively, then:

  1. Insert (X, sz) in mapA --- O(lg sz)
  2. Insert (Y, sz) in mapB --- O(lg sz)
  3. Then push_back (IterX, IterY) in register --- O(1). Here IterX and IterY are iterators to the corresponding element in mapA and mapB and are obtained from (1) and (2) respectively.

Lookup

Lookup for the image of an element, X, of type A:

  1. Get the int mapped to X in mapA. --- O(lg n)
  2. Use the int to index into register and get corresponding IterY. --- O(1)
  3. Once you have IterY, you can get Y through IterY->first. --- O(1)

So both the operations are implemented in O(lg n).

Space: All the copies of the objects of A and B are required to be stored only once. There is, however, a lot of bookkeeping stuff. But when you have large objects, that would also be not much significant.

Note: This implementation relies on the fact that the iterators of a map are never invalidated. Hence, the contents of register are always valid.

A more elaborate version of this implementation can be found here

Pagan answered 11/6, 2015 at 9:10 Comment(0)
E
1

How about this?

Here, we avoid double storage of one type (T1). The other type (T2) is still double stored.

// Assume T1 is relatively heavier (e.g. string) than T2 (e.g. int family).
// If not, client should instantiate this the other way.
template <typename T1, typename T2>
class UnorderedBimap {

  typedef std::unordered_map<T1, T2> Map1;
  Map1 map1_;

  std::unordered_map<T2, Map1::iterator> map2_;
};
Equable answered 29/3, 2016 at 23:0 Comment(1)
A drawback is that if Map1 has been rehashed, we have to rebuild map2_, because all the iterators in map2_ have been invalidated.Councilwoman

© 2022 - 2024 — McMap. All rights reserved.