how to implement a sparse_vector class

D

2

4

I am implementing a templated sparse_vector class. It's like a vector, but it only stores elements that are different from their default constructed value.

So, sparse_vector would store the lazily-sorted index-value pairs for all indices whose value is not T().

I am basing my implementation on existing sparse vectors in numeric libraries-- though mine will handle non-numeric types T as well. I looked at boost::numeric::ublas::coordinate_vector and eigen::SparseVector.

Both store:

size_t* indices_;  // a dynamic array
T* values_;  // a dynamic array 
int size_;
int capacity_;

Why don't they simply use

vector<pair<size_t, T>> data_;

My main question is what are the pros and cons of both systems, and which is ultimately better?

The vector of pairs manages size_ and capacity_ for you, and simplifies the accompanying iterator classes; it also has one memory block instead of two, so it incurs half the reallocations, and might have better locality of reference.

The other solution might search more quickly since the cache lines fill up with only index data during a search. There might also be some alignment advantages if T is an 8-byte type?

It seems to me that vector of pairs is the better solution, yet both containers chose the other solution. Why?

Dramaturge answered 11/5, 2010 at 6:14 Comment(4)

I'd think a map + a size/capacity would be better than any array/vector option. o.0 – Forepaw 11/5, 2010 at 6:20

To start, having an uninitialized T in a pair<size_t, T> will muck with pair's ctors and dtor (and is essentially unworkable). Right off the bat you'd need your own specialized utility class (something like boost::optional). – Colon 11/5, 2010 at 6:21

@Dav: boost has another sparse vector type that uses a map. It would be a little slower in practice with vectors with many entries. It's a good solution though. @Roger Pate: The T's that you store are all initialized. – Dramaturge 11/5, 2010 at 6:24

@Dav: clarified that the vector is sorted (when doing finds, but not suring insertions...) so it is as fast as map for lookups (except the first)-- and incurs a fraction of the cache misses – Dramaturge 11/5, 2010 at 6:32

O

1

Effectively, it seems that they reinvented the wheel (so to speak).

I would personally consider 2 libraries for your need:

Loki, for Loki::AssocVector -> the interface of a map implemented over a vector (which is what you wish to do)
Boost.Iterator, for its iterator_adaptor class. Makes it very easy to implement a new container by Composition.

As a remark, I would note that you may wish to be a little more generic that values different from the T() because this impose T to be DefaultConstructible. You could provide a constructor which takes a T const&. When writing a generic container it is good to try and reduce the necessary requirements as much as possible (as long as it does not hurt performance).

Also, I would remind you that the idea of using a vector for storage is very good for a little number of values, but you might wish to change the underlying container toward a classic map or unordered_map if the number of values grows. It could be worth profiling/timing. Note that the STL offer this ability with the Container Adapters like stack, even though it could make implementation slightly harder.

Have fun.

Overalls answered 11/5, 2010 at 7:0 Comment(1)

thanks for pointing out iterator_adaptor -- did not know about it. – Dramaturge 11/5, 2010 at 15:4

U

2

Having indices in a separate list would make them faster to look up - as you suggest, it would use the cache more effectively, particularly if T is large.

If you want to implement your own, why not just use std::map (or std::unordered_map)? Keys would be larger but implementation time would be close to zero!

Unwish answered 11/5, 2010 at 6:28 Comment(0)

O

1