Elegant way to find keys with given prefix in std::map or elements in std::set
Asked Answered
R

2

10

I have map, which keys are std::string. I want to find those elements in the map which starts with "DUPA/" prefix. Finding lower bound is easy but upper bound is a bit problematic. I wrote such piece of code:

const char* prefix = "DUPA/";
const char* firstAfterPrefix = "DUPA0";
auto prefixedBeginIt = myMap.upper_bound(prefix);
auto prefixedEndIt = myMap.lower_bound(firstAfterPrefix);

The code works fine but I don't consider it to be elegant, as one has to know that 0 is first next to / in ASCII table. Second way would be to copy prefix and increment last sign. Do you know more elegant solution?

Ramer answered 23/6, 2017 at 9:30 Comment(2)
Your prefixedBeginIt will not find a key identical to the prefix, you should use lower_bound instead of upper_bound.Quag
@Quag that is not connected to the question at all. This is expected behavior not to find prefix itself.Bookbinding
M
5

I think the solution you mentioned is already the most elegant. The KISS way loses a lot of performance, that is, checking the key each time:

while(prefixedBeginIt->first == prefix)
{
 //...
 ++prefixedBeginIt;
}

Thus I think calculating the next char is the best approach:

std::string firstAfterPrefix = prefix;
++firstAfterPrefix[firstAfterPrefix.length() - 1];
auto prefixedEndIt = myMap.lower_bound(firstAfterPrefix);
Marney answered 23/6, 2017 at 9:46 Comment(2)
Yes, I think it could be the best solution, but I would use char[] instead of std::string in this particular problem. But as a generic solution, It's the best as long as last char does not equal MAX_CHAR ;)Bookbinding
Looks like this will crash and burn when prefix is the empty string?Subteen
C
2

If you can assume that CHAR_MAX would not be a valid character in your strings, then you can create firstAfterPrefix by appending CHAR_MAX (or the maximum value of your character type if it's not char).

std::string prefix = "DUPA/";

constexpr auto char_max = std::numeric_limits<decltype(prefix)::value_type>::max();
std::string firstAfterPrefix = prefix + char_max;

auto prefixedBeginIt = myMap.lower_bound(prefix);
auto prefixedEndIt = myMap.lower_bound(firstAfterPrefix);

Note the use of lower_bound for both bounds. Like Gill I am using std::string to simplify the exposition.


If you can use C++14 and specify the Compare template argument of the container then another way is to use a custom probe object:

struct PrefixProbe { std::string_view prefix; };
bool operator<(PrefixProbe a, std::string_view b) { return a.prefix < b.substr(0, a.prefix.size()); }
bool operator<(std::string_view a, PrefixProbe b) { return a.substr(0, b.prefix.size()) < b.prefix; }

std::map<std::string, myValue, std::less<>> myMap;
//                             ^~~~~~~~~~~
//                             where the magic happens

auto prefixBegin = myMap.lower_bound(PrefixProbe { prefix });
auto prefixEnd = myMap.upper_bound(PrefixProbe { prefix });

std::string_view is C++17 but is not required to make this work.

equal_range would reduce the last two lines to a single line:

auto [ prefixBegin, prefixEnd ] = myMap.equal_range(PrefixProbe { prefix });

If you are prepared to use the STL algorithms instead of the container member functions then this can be done without altering the container type, but would be less efficient:

auto prefixBegin = std::lower_bound(cbegin(myMap), cend(myMap), PrefixProbe { prefix }, std::less<>{});
auto prefixEnd = std::upper_bound(cbegin(myMap), cend(myMap), PrefixProbe { prefix }, std::less<>{});

auto [ prefixBegin, prefixEnd ] = std::equal_range(cbegin(myMap), cend(myMap), PrefixProbe { prefix }, std::less<>{});
Chamois answered 23/6, 2017 at 14:27 Comment(7)
The probe idea is brilliant; I think it's the elegant answer to the elegant general form of this question (that is, searching for a condition that isn't naturally expressible as the upper or lower bound of a specific key). In contrast, your first alternative idea of appending CHAR_MAX seems rather mundane and fiddly and special-purpose; I'd consider just removing that part.Subteen
Maybe add a reference to an introduction to heterogenous lookup, for the uninitiated? This is a nice one. You do say "std::less<> ^where the magic happens", which is true, but I'm not sure that's helpful to someone not already familiar with heterogeneous lookup.Subteen
Why do you say std::equal_range would be less efficient than the member function? The implementations of both are in header files, all visible to the compiler which will inline it, so I'd expect both to compile to compile to the same thing. No?Subteen
a.substr(0, b.prefix.size()) == b.prefix isn't a very good idiom for checking prefix-ness, performance-wise. There's a good one here: #1878501Subteen
I think your operator< functions aren't quite right. Try them on std::set<std::string, std::less<>> myMap = {"","A","AA","AB","AC","B","BA","BB","BC","C","CA","CB","CC"};. Your myMap.equal_range(PrefixProbe("B") returns {"","A","AA","AB","AC","B"}. Correct answer is {"B","BA","BB","BC"}.Subteen
I discovered this method while developing a project, but I had doubts and went online. I am doing the exact same thing but use upper_bound for the second part. I noticed you pointed out using lower_bound, in all my example cases they give the same result. Am I missing something, or in this specific case, they really are the same thing ?Vasilek
@MaxPaython You're right, you'll get the same result either way. Not sure why I pointed it out. If a key might contain a CHAR_MAX character then it would be different, but in that case neither would be guaranteed to work anyway.Chamois

© 2022 - 2024 — McMap. All rights reserved.