Using an iterator to Divide an Array into Parts with Unequal Size
Asked Answered
L

3

1

I have an array which I need to divide up into 3-element sub-arrays. I wanted to do this with iterators, but I end up iterating past the end of the array and segfaulting even though I don't dereference the iterator. given: auto foo = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }; I'm doing:

auto bar = cbegin(foo);

for (auto it = next(bar, 3); it < foo.end(); bar = it, it = next(bar, 3)) {
    for_each(bar, it, [](const auto& i) { cout << i << endl; });
}

for_each(bar, cend(foo), [](const auto& i) { cout << i << endl; });

Now I can solve this by defining a finish iterator:

auto bar = cbegin(foo);
auto finish = next(cend(foo), -(size(foo) % 3));

for (auto it = next(bar, 3); it != finish; bar = it, it = next(bar, 3)) {
    for_each(bar, it, [](const auto& i) { cout << i << endl; });
}

for_each(bar, finish, [](const auto& i) { cout << i << endl; });
for_each(finish, cend(foo), [](const auto& i) { cout << i << endl; });

But this seems unnecessary when I don't dereference the iterator. Why can't I do the first version?

Lumbar answered 5/4, 2016 at 11:40 Comment(12)
that's what the standard says: you cannot obtain an iterator outside [begin, end]. Additionally, you cannot dereference the end iterator. This rule is an extension of pointers: you cannot obtain a pointer that doesn't point to an object or array or 1 past the last element of an array.Nairn
@Nairn Do you have a source for that? I mean it's just a number in an int till I dereference it, right?Lumbar
You algorithm seems to depend on a random access iterator, you might use an index (size_r) and operator [] instead.Saiz
@DieterLücking I actually had also written up an index version, which seems more hopeful. In that at least it isn't segfaulting :(Lumbar
I'm sure someone will come and add standard quotes. As for reasons: C++ is thought to be as generic as possible, it has to work on all sorts of crazy unthinkable architectures. The ideea is that you have to ask yourself what happens for instance when your array is near the end of the memory addressable space getting last + 10 would not be pointing to an invalid memory, but would not make sense, as let's say last + 5 is the last memory address. So the standard says it's undefined behaviorNairn
@Nairn You make a good point about exceeding the size of an int. But we know that C++ has to be able to return an end pointer which means that it cannot allocate space in the very last line of accessible memory. At a maximum it could only represent the allocate on the next to last line. Meaning that as long as my modulo number is smaller than sizeof(int*) * 8 there shouldn't be any undefined behavior.Lumbar
@JonathanMee I don't follow you. It is required that an iterator/pointer to one past the last element of an array must be valid. Past last there is no guarantee, Obtaining one past last is undefined behavior. And I think you confuse things a bit. I don't see any relation to sizeof(int*) * 8. I say again: last can hypothetically be a pointer to the last addressable memory for an object type stored by the array. Going past last is undefined behavior.Nairn
Also I was careful not talking about int. There are pointers and iterators and then there is int. The standard only guarantees that a pointer can be converted to an intptr_t and then back again to a pointer and you get the original pointer value. It doesn't even make guarantees about the intptr_t value.Nairn
@Nairn I mean we are talking about an int here (or a long to be precise.) Pointers are simply a memory index that boils down to a unique number, hence we can compare int*s just like ints, and that's what I'm doing here.Lumbar
@JonathanMee You would think so. I did to. But no. We are not talking about integers. We are talking about pointers. And even though on most architectures on most compilers pointers are implemented via an integer type, the standard doesn't make this connection. For instance. If you have int a; int b it is undefined behavior to compare the pointers to a and b, i.e. &a < &b is undefined behavior, even if a int * is just an integer underneath. There are rules for integers and then there are rules for pointers. For better or worse that's how C++ was designed.Nairn
@Nairn You got a source for &a < &b being undefined?Lumbar
From § 5.9 of the C++11 standard: "If two pointers p and q of the same type point to different objects that are not members of the same object or elements of the same array or to different functions, or if only one of them is null, the results of p<q, p>q, p<=q, and p>=q are unspecified.". My bad, it's not undefined, it's unspecified.Nairn
U
1

The reason this is prohibited is covered well at your other question Are iterators past the "one past-the-end" iterator undefined behavior? so I'll just address improved solutions.

For random-access iterators (which you must have if you are using <), there's no need whatsoever for the expensive modulo operation.

The salient points are that:

  • it + stride fails when it nears the end
  • end() - stride fails if the container contains too few elements
  • end() - it is always legal

From there, it's simple algebraic manipulation to change it + stride < end() into a legal form (subtract it from both sides).

The final result, which I have used many times:

for( auto it = c.cbegin(), end = c.cend(); end - it >= stride; it += stride )

The compiler is free to optimize that back to comparison to a precomputed end - stride * sizeof(*it) if the memory model is flat -- the limitations of C++ behavior don't apply to the primitive operations which the compiler translates C++ into.

You may of course use std::distance(it, end) if you prefer to use the named functions instead of operators, but that will only be efficient for random-access iterators.

For use with forward iterators, you should use something that combines the increment and termination conditions like

struct less_preferred { size_t value; less_preferred(size_t v) : value(v){} };

template<typename Iterator>
bool try_advance( Iterator& it, less_preferred step, Iterator end )
{
     while (step.value--) {
         if (it == end) return false;
         ++it;
     }
     return true;
}

With this additional overload, you'll get efficient behavior for random-access iterators:

template<typename RandomIterator>
auto try_advance( RandomIterator& it, size_t stride, RandomIterator end )
     -> decltype(end - it < stride) // SFINAE
{
     if (end - it < stride) return false;
     it += stride;
     return true;
}
Ultramicroscope answered 17/5, 2016 at 15:18 Comment(3)
OK, so you're saying that one integer modulo is going to be more expensive than c.size() / stride subtractions? Sounds fishy.Lumbar
@JonathanMee: At runtime the it != final is going to be executed as it - final != 0 anyway; I expect the optimizer to make end - it >= stride just as efficient.Ultramicroscope
A couple comments: 1) try_advance can't compile, even after I edited away the bugs, because random access iterators match both specializations 2) Using "-O3" on gcc it != final generates only a comparison, not a subtraction, while end - it >= stride does incur a subtraction each call which I expect makes it more expensive. I've tried to shrink the supporting data into a comment, but I can't even, so I've added it as a new answer: https://mcmap.net/q/41433/-using-an-iterator-to-divide-an-array-into-parts-with-unequal-size I'm not sure how relevant this is to the question so I'll probably just delete it after we finish discussing.Lumbar
L
2

The segfault you are seeing is coming from next checking the range for you is an assertion in your Debug implementation to check against undefined behavior. The behavior of iterators and pointers is not defined beyond the their allocated range, and the "one past-the-end" element: Are iterators past the "one past-the-end" iterator undefined behavior?

This means that incrementing past the "one past-the-end" element is undefined behavior independent of the iterator's subsequent use. In order to have defined behavior you must use a solution like your Integer Modulo algorithm or similar, but you will have to change auto it = next(bar, 3) to something that conditionalizes based on the availability of at least the size of your sub-array, so something like: auto it = size(foo) <= 3 ? finish : next(bar, 3).

Where available the best solution here is going to cause the least redundant iteration is to track the size remaining in the container as an integer which does not suffer from undefined behavior when it falls outside the range and "one past-the-end". This can be accomplished by:

auto bar = cbegin(foo);

for (auto i = size(foo); i > STEP; i -= STEP) {
    for(auto j = 0; j < STEP; ++j, ++bar) cout << *bar << '\t';
    cout << endl;
}

for(auto i = 0; j < STEP; ++j, ++bar) cout << *bar << '\t';
cout << endl;

EDIT: I had previously suggested using pointers which are not Debug conditioned, this is undefined behavior.

The problem is that next is checking the range for you. We use pointers outside of allocated memory all the time, for example nullptr and end, and that's all it here is. If you just use C-style pointer arithmetic here you'll be fine:

auto bar = cbegin(foo);

for (auto it = bar + 3; it < cend(foo); bar = it, it = bar + 3) {
    for_each(bar, it, [](const auto& i) { cout << i << endl; });
}

for_each(bar, cend(foo), [](const auto& i) { cout << '\t' << i << endl; });

Live Example

Alternatively, if you run in Release configuration the range checks should be removed, so you will be able to use the first version of your code.

Lumbar answered 5/4, 2016 at 11:58 Comment(12)
This still may invoke UB. from [iterator.requirements](5): [...]Iterators can also have singular values that are not associated with any sequence.[...] Results of most expressions are undefined for singular values[...]Larrikin
@Larrikin "Iterators can also have singular values that are not associated with any sequence." Just as end is associated with this sequence it is associated with this sequence, they're just both, past the end iterators, which are not dereferenceable, but I don't think that either qualifies as a "Singular Value".Lumbar
I am not sure. I do not know if 2 past then end is less valid then 1 past the end. there is: Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values.Larrikin
@NathatOliver Yeah that's what I was referencing here: #36425893 But you're point is well made, depending on the size of my partition bar + # may overflow the integral value in which case this wouldn't work. If you want to type that up, I'll accept, otherwise, I'll just correct my answer.Lumbar
The problem isn't that the debug configuration does a range check. The problem is that you have undefined behavior.Nairn
This has nothing to do with overflowing. If last is one past then end of the array then is last + 1 still considered a past-the-end iterator or is it a singular iterator and UB? I am still searching for that.Larrikin
I found it. see this answer. Not sure if this actually qualifies as a dupe or not.Larrikin
@Larrikin Excellent find sir. It's annoying that it calls out a string. More annoying that the answer directly references string but it seems very relevant...Lumbar
@JonathanMee No problem. As you can see from my comment I am not 100% sure if the answer is correct or not. 8 other people agreed with it but it is not accepted so there is that. I am still trying to find something in the standard.Larrikin
@Larrikin I don't think that you'll find anything in the standard. I think that arithmetic on pointers is always legal, it's dereferencing them that's only legal within an allocated range. Here's the thing though, I don't know where in memory this is allocated. If it's at the upper bound of what I can index, and my increment pushes the index past that upper bound I think that's when it would be a problem.Lumbar
@Nairn I have come around to your way of thinking and updated the answer. Thanks for holding me to the standard.Lumbar
@Larrikin I finally got standard requirement we were looking for. I've linked it and updated my answer.Lumbar
U
1

The reason this is prohibited is covered well at your other question Are iterators past the "one past-the-end" iterator undefined behavior? so I'll just address improved solutions.

For random-access iterators (which you must have if you are using <), there's no need whatsoever for the expensive modulo operation.

The salient points are that:

  • it + stride fails when it nears the end
  • end() - stride fails if the container contains too few elements
  • end() - it is always legal

From there, it's simple algebraic manipulation to change it + stride < end() into a legal form (subtract it from both sides).

The final result, which I have used many times:

for( auto it = c.cbegin(), end = c.cend(); end - it >= stride; it += stride )

The compiler is free to optimize that back to comparison to a precomputed end - stride * sizeof(*it) if the memory model is flat -- the limitations of C++ behavior don't apply to the primitive operations which the compiler translates C++ into.

You may of course use std::distance(it, end) if you prefer to use the named functions instead of operators, but that will only be efficient for random-access iterators.

For use with forward iterators, you should use something that combines the increment and termination conditions like

struct less_preferred { size_t value; less_preferred(size_t v) : value(v){} };

template<typename Iterator>
bool try_advance( Iterator& it, less_preferred step, Iterator end )
{
     while (step.value--) {
         if (it == end) return false;
         ++it;
     }
     return true;
}

With this additional overload, you'll get efficient behavior for random-access iterators:

template<typename RandomIterator>
auto try_advance( RandomIterator& it, size_t stride, RandomIterator end )
     -> decltype(end - it < stride) // SFINAE
{
     if (end - it < stride) return false;
     it += stride;
     return true;
}
Ultramicroscope answered 17/5, 2016 at 15:18 Comment(3)
OK, so you're saying that one integer modulo is going to be more expensive than c.size() / stride subtractions? Sounds fishy.Lumbar
@JonathanMee: At runtime the it != final is going to be executed as it - final != 0 anyway; I expect the optimizer to make end - it >= stride just as efficient.Ultramicroscope
A couple comments: 1) try_advance can't compile, even after I edited away the bugs, because random access iterators match both specializations 2) Using "-O3" on gcc it != final generates only a comparison, not a subtraction, while end - it >= stride does incur a subtraction each call which I expect makes it more expensive. I've tried to shrink the supporting data into a comment, but I can't even, so I've added it as a new answer: https://mcmap.net/q/41433/-using-an-iterator-to-divide-an-array-into-parts-with-unequal-size I'm not sure how relevant this is to the question so I'll probably just delete it after we finish discussing.Lumbar
L
0

There is some disagreement about the most effective way to accomplish this iteration through array partitions.

First the one time integer modulo method, this must define auto size in addition to the changes in my answer because gcc does not yet support size:

auto foo = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };  
auto size = distance(cbegin(foo), cend(foo));
auto bar = cbegin(foo);
auto finish = prev(cend(foo), size % 3);

for(auto it = size <= 3 ? cend(foo) : next(bar, 3); it != finish; bar = it, it = next(bar, 3)) {
    for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
    cout << endl;
}

for_each(bar, finish, [](const auto& i) { cout << i << '\t'; });
cout << endl;
for_each(finish, cend(foo), [](const auto& i) { cout << i << '\t'; });
cout << endl;

This creates 112 lines of assembly, most notably the conditional it != finish generates these instructions:

cmpq    %r12, %r13
je      .L19
movq    %r12, %rbx
jmp     .L10

Second the repeated iterator subtraction using Ben Voigt's try_advance but only with the random access specialization because there is a compiler conflict for random access iterators:

auto foo = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };  
auto bar = cbegin(foo);

for (auto it = cbegin(foo), end = cend(foo); try_advance(it, 3, end); bar = it) {
    for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
    cout << endl;
}

for_each(bar, cend(foo), [](const auto& i) { cout << i << '\t'; });
cout << endl;

This creates 119 lines of assembly, most notably the conditional in try_advance: if (end - it < stride) return false; incurs a per iteration generating the code:

movq    %r12, %rax
subq    %rbp, %rax
cmpq    $11, %rax
ja      .L3

Upon learning that cmpq is really just a subtract and compare operation I have written some bench-marking code: http://coliru.stacked-crooked.com/a/ad869f69c8dbd96f I needed to use Coliru to be able to turn on optimization, but it keeps giving me bogus increments of my test count for times, I'm not sure what's going on there. What I can say is locally, the repeated iterator subtraction is always faster, sometimes significantly so. Upon learning this I believe that Ben Voigt's answer should be marked as the correct one.

EDIT:

I've made an interesting discovery. It's the algorithm that goes first that always looses. I've rewriten the code to swap the first algorithm on each pass. When this is done the integer modulo method always beats the iterator subtraction method as would be suspected by looking at the assembly, again something fishy is going on with Coliru, but you can take this code and run it locally: http://coliru.stacked-crooked.com/a/eb3e0c70cc138ecf


The next issue is that both of these algorithms are lazy; in the event that size(foo) is a multiple of 3 they allocate an empty vector at the end of the vector. That requires significant branching for the integer modulo algorithm to remedy, but only the simplest of changes for the repeated iterator subtraction algorithm. The resulting algorithms exhibit effectively equal benchmark numbers but the edge goes to the repeated iterator subtraction for simplicity:

Integer modulo algorithm:

auto bar = cbegin(foo);
const auto size = distance(bar, cend(foo));

if (size <= 3) {
    for_each(bar, cend(foo), [](const auto& i) { cout << i << '\t'; });
    cout << endl;
}
else {
    auto finish = prev(cend(testValues), (size - 1) % 3 + 1);

    for (auto it = next(bar, 3); it != finish; bar = it, advance(it, 3)) {
        for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
        cout << endl;
    }

    for_each(bar, finish, [](const auto& i) { cout << i << '\t'; });
    cout << endl;
    for_each(finish, cend(foo), [](const auto& i) { cout << i << '\t'; });
    cout << endl;
}

Repeated iterator subtraction algorithm:

auto bar = cbegin(foo);

for (auto it = cbegin(foo); distance(it, cend(foo)) > 3; bar = it) {
    advance(it, 3);
    for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
    cout << endl;
}

for_each(bar, cend(foo), [](const auto& i) { cout << i << '\t'; });
cout << endl;

EDIT: Throwing the Remaining Size Algorithm into the hat

Both the Integer Modulo and Repeated Subtraction Algorithms above suffer from iterating over the input sequence more than once, other than being slower this isn't that serious because currently we're using a Bidirectional Iterator, but should our input iterator fail to qualify for Bidirectional Iterator this would be excessively expensive. Independent of iterator type the Remaining Size Algorithm beats all challengers every time at 10,000,000+ testbench iterations:

auto bar = cbegin(foo);

for (auto i = size(foo); i > STEP; i -= STEP) {
    for(auto j = 0; j < STEP; ++j, ++bar) cout << *bar << '\t';
    cout << endl;
}

for(auto i = 0; j < STEP; ++j, ++bar) cout << *bar << '\t';
cout << endl;

I've again copied my local testing to Coliru, which gives weird results but you can verify locally: http://coliru.stacked-crooked.com/a/361f238216cdbace

Lumbar answered 18/5, 2016 at 12:32 Comment(7)
Btw cmp instruction is "subtract and set flags (discard result)". What optimization level is this?Ultramicroscope
@BenVoigt Thanks for the info. I've added some bench-marking code, and you can imagine my surprise that even though your code always compiles to larger assembly code, it is faster, consistently. (Though I have no idea what's up with Coliru?) I want to accept your answer, but can you clean it up so that the try_advance doesn't fail to compile if I include both specializations?Lumbar
@BenVoigt Nope, doesn't work. It has been my experience that varadic arguments can only break ties if they are required by the call. You can't even use them to tie break against defaulted arguments.Lumbar
Ok, fixed by having a user-defined conversion needed for the less favorable overload.Ultramicroscope
@BenVoigt I need to be able to use distance(it, cend(foo)) > 3 in order to prevent an extra allocation which makes this algorithm competitive with the integer modulo algorithm. But that makes the performance dreadful for forward iterators. Is there anything that you know of that can be done here? By the way, it makes an awful lot of sense to use iterator traits since we're using iterators.Lumbar
@BenVoigt OK I have a new challenger, which seems to beat all comers my answer has been updated with the Repeated Subtraction Algorithm. This algorithm also gets around the problems with Forward Iterators. Comments?Lumbar
Your last coliru code (coliru.stacked-crooked.com/a/361f238216cdbace) seems to largely be a comparison of resize+emplace vs reserve + push_back, and both would appear to benefit from using emplace_back rather than push_back.Signal

© 2022 - 2024 — McMap. All rights reserved.