Constant time `contains` for `std::vector`? [duplicate]
Asked Answered
S

2

5

I am working with some code that checks if std::vector contains a given element in constant time by comparing its address to those describing the extent of the vector's data. However I suspect that, although it works, it relies on undefined behaviour. If the element is not contained by the vector then the pointer comparisons are not permitted.

bool contains(const std::vector<T>& v, const T& a) {
  return (v.data() <= &a) && (&a < v.data() + v.size());
}

Am I right in believing it is undefined behaviour? If so, is there any way to do the same thing without drastically changing the time complexity of the code?

Sneed answered 23/9, 2021 at 7:0 Comment(4)
Did you test your code? Unless you take an element from the vector (which will obviously be contained in the vector) this won't even work: godbolt.org/z/hrY1d4foxOctofoil
@mediocrevegetable1: Yes it is meant to check for containment by identity, not equality.Sneed
somewhere up the line you either took a reference to an element in the vector or to a different T, so maybe it is an option to use a std::pair<bool,std::reference_wrapper<T>> instead of the bare reference in the first placeAstrobiology
@463035818-is-not-a-number: Yes that is a possibility. I could keep a record of which vector the reference was obtained from. However it adds a bit of (perhaps!) unnecessary complexity to the code.Sneed
E
10

You can use std::less

A specialization of std::less for any pointer type yields the implementation-defined strict total order, even if the built-in < operator does not.


Update:

The standard doesn't guarantee that this will actually work for contains though. If you have say two vectors a and b, the total order is permitted to be &a[0], &b[0], &a[1], &b[1], &a[2], &b[2], ..., i.e., with the elements interleaved.

As pointed out in the comments, the standard only guarantees that std::less yields the implementation-defined strict total order, which is is consistent with the partial order imposed by the builtin operators. However, the standard doesn't guarantee the order of pointers pointing to different objects or arrays. Releated: https://devblogs.microsoft.com/oldnewthing/20170927-00/?p=97095

One interesting thing is that there's a similar usage in Herb Sutter's gcpp library(link). There's a comment saying that it is portable, the library is experimental though.

    //  Return whether p points into this page's storage and is allocated.
    //
    inline
    bool gpage::contains(gsl::not_null<const byte*> p) const noexcept {
        //  Use std::less<> to compare (possibly unrelated) pointers portably
        auto const cmp = std::less<>{};
        auto const ext = extent();
        return !cmp(p, ext.data()) && cmp(p, ext.data() + ext.size());
    }
Epilepsy answered 23/9, 2021 at 7:8 Comment(5)
I understood this to mean that std::less simply bypasses any redefinition of the operator<. Wouldn't the undefined behaviour of the pointer comparison still remain?Sneed
@RichardForrest std::less invokes operator< on type T unless specialized. It is specialized for pointer type as the quoted text saying. This is also the reason that you can use something like std::set<T*> without relying on undefined behavior.Epilepsy
That's great. This should work perfectly for my situation then. I was not expecting such a good outcome. Thanks.Sneed
The standard doesn't guarantee that this will actually work for contains though. If you have say two vectors a and b, the total order is permitted to be &a[0], &b[0], &a[1], &b[1], &a[2], &b[2], ..., i.e., with the elements interleaved.Highup
@Highup I don't known much about the standard. That seems correct if I read only sections directly related to std::less. I think that a new question is worth it.Epilepsy
G
4

Yes, the comparisons as written are not permitted if the reference doesn't reference something that is already an element of the vector.

You can make the behavior defined by casting all pointers to uintptr_t and comparing those. This will work on all architectures with continuous memory (i.e. possibly not old 16-bit x86), although I don't know if the specific semantics are guaranteed.

As a side note, I would always interpret the name contains to be about the value, and thus be very surprised if the semantics are anything other than std::find(v.begin(), v.end(), a) != v.end(). Consider using a more expressive name.

Generate answered 23/9, 2021 at 7:8 Comment(1)
Comparing integers casted to pointers isn't technically guaranteed to yield the same result as comparing the pointers. The mapping from pointer to integer is implementation defined.Lingonberry

© 2022 - 2024 — McMap. All rights reserved.